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Abstract 

Under  the  assumption  that  individuals  know  the  conditional  distributions  of  signals 
given  the  payoff-relevant  parameters,  existing  results  conclude  that,  as  individuals  ob- 
serve infinitely  many  signals,  their  beliefs  about  the  parameters  will  eventually  merge. 
We  first  show  that  these  results  are  fragile  when  individuals  are  uncertain  about  the 
signal  distributions:  given  any  such  model,  a  vanishingly  small  individual  uncertainty 
about  the  signal  distributions  can  lead  to  a  substantial  (non- vanishing)  amount  of  differ- 
ences between  the  asymptotic  beliefs.  We  then  characterize  the  conditions  under  which 
a  small  amount  of  uncertainty  leads  only  to  a  small  amount  of  asymptotic  disagreement. 
According  to  our  characterization,  this  is  the  case  if  the  uncertainty  about  the  signal 
distributions  is  generated  by  a  family  with  "rapidly-varying  tails"  (such  as  the  normal 
or  the  exponential  distributions).  However,  when  this  famity  has  "regularly- varying 
tails"  (such  as  the  Pareto,  the  log-normal,  and  the  t-distributions),  a  small  amount  of 
uncertainty  leads  to  a  substantial  amount  of  asymptotic  disagreement. 
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1     Introduction 

The  common  prior  assumption  is  one  of  the  cornerstones  of  modern  economic  analy- 
sis. Most  models  postulate  that  the  players  in  a  game  have  the  "same  model  of  the 
world,"  or  more  precisely,  that  they  have  a  common  prior  about  the  game  form  and 
payoff  distributions — for  example,  they  all  agree  that  some  state  (payoff-relevant  para- 
meter vector)  9  is  drawn  from  a  known  distribution  G,  even  though  each  may  also  have 
additional  information  about  some  components  of  9.  The  typical  justification  for  the 
common  prior  assumption  comes  from  learning;  individuals,  through  their  own  experi- 
ences and  the  communication  of  others,  will  have  access  to  a  history  of  events  informative 
about  the  state  6*,  and  this  process  will  lead  to  "agreement"  among  individuals  about  the 
distribution  of  9.  A  strong  version  of  this  view  is  expressed  in  Savage  (1954,  p.  48)  as 
the  statement  that  a  Bayesian  individual,  who  does  not  assign  zero  probability  to  "the 
truth,"  will  learn  it  eventually  as  long  as  the  signals  are  informative  about  the  truth. 
An  immediate  implication  of  this  result  is  that  two  individuals  who  observe  the  same 
sequence  of  signals  ^^^ll  ultimately  agree,  even  if  they  start  with  very  different  priors. 

Despite  this  powerful  intuition,  disagreement  is  the  rule  rather  than  the  exception  in 
practice.  For  example,  there  is  typically  considerable  disagreement  even  among  econo- 
mists working  on  a  certain  topic.  Similarly,  there  are  deep  divides  about  rehgious  beliefs 
within  populations  with  shared  experiences.  In  most  cases,  the  source  of  disagreement 
does  not  seem  to  be  differences  in  observations  or  experiences.  Instead,  individuals  ap- 
pear to  interpret  the  available  data  differently.  For  example,  an  estimate  showing  that 
subsidies  increase  investment  is  interpreted  very  differently  by  two  economists  starting 
with  different  priors.  An  economist  beheving  that  subsidies  have  no  effect  on  invest- 
ment appears  more  hkely  to  judge  the  data  or  the  methods  leading  to  this  estimate  to 
be  unreliable  and  thus  to  attach  less  importance  to  this  evidence. 

In  this  paper,  we  investigate  the  outcome  of  learning  about  an  uiiderlying  state  by 
two  Bayesian  individuals  wth  different  priors  when  they  are  possibly  unceHain  about 
the  conditional  distributions  (or  interpretations)  of  signals.  This  leads  to  a  potential 
identification  problem,  as  the  same  long-run  frequency  of  sigiials  may  result  from  differ- 
ent combinations  of  payoff-relevant  variables  and  different  interpretations  of  the  signals. 
Hence,  even  though  the  individuals  will  learn  the  asymptotic  frequency  of  signals,  they 
may  not  always  be  able  to  infer  the  state  9,  and  initial  differences  in  their  beliefs  may 
translate  into  differences  in  asymptotic  beliefs.  When  the  amount  of  uncertainty  is  small. 


the  identification  problem  is  also  small  in  the  sense  that  each  individual  finds  it  highly 
likely  that  he  \vill  eventually  assign  high  probability  to  the  true  state.  One  may  then 
expect  that  the  asymptotic  beliefs  of  the  two  individuals  about  the  underlying  states 
should  be  close  as  well.  If  so,  the  common  prior  assumption  would  be  a  good  approxi- 
mation when  players  have  a  long  common  experience  and  face  only  a  small  amount  of 
uncertainty  about  how  the  signals  are  related  to  the  states. 

Our  focus  in  this  paper  is  to  investigate  the  validity  of  this  line  of  argument.  In  par- 
ticular, we  study  whether  asymptotic  agreement  is  continuous  at  certainty.  Asymptotic 
agreement  is  continuous  at  certainty  if  a  small  amount  of  uncertainty  leads  only  to  a 
small  amount  of  disagreement  asymptotically.  Our  main  result  shows  that  asymptotic 
agreement  is  discontinuous  at  certainty  for  every  model:  for  every  model  there  is  a  van- 
ishingly  small  amount  of  uncertainty  that  is  sufficient  for  each  individual  to  assign  nearly 
probability  1  that  they  will  asymptotically  hold  significantly  different  beliefs  about  the 
underlying  state.  This  result  implies  that  learning  foundations  of  common  prior  are  not 
as  strong  as  one  might  have  thought. 

Before  explaining  our  main  result  and  its  intuition,  it  is  useful  to  provide  some  details 
about  the  environment  we  study.  Two  individuals  with  given  priors  observe  a  sequence 
of  signals,  {S(}"^q,  and  form  their  posteriors  about  the  state  6.  The  only  non-standard 
feature  of  the  environment  is  that  these  individuals  may  be  uncertain  about  the  distri- 
bution of  signals  conditional  on  the  underlying  state.  In  the  simplest  ca,se  where  the 
state  and  the  signal  are  binary,  e.g.,  6  e  {A,B},  and  Sj  G  {a.b},  this  imphes  that 
Pr  [st  =  9  \  9)  =  po  is  not  a  known  number,  but  individuals  also  have  a  prior  over  pg, 
say  given  by  a  cumulative  distribution  function  Fg  for  each  agent  i  =  1,2.  We  refer  to 
Fg  as  individual's  subjective  probability  distribution  and  to  its  density  fg  as  subjective 
(probability)  density.  This  distribution,  which  can  differ  among  individuals,  is  a  nat- 
ural measure  of  their  uncertainty  about  the  informativeness  of  signals.  When  subjective 
probability  distributions  are  non-degenerate,  individuals  will  have  some  latitude  in  in- 
terpreting the  sequence  of  signals  they  observe.  The  presence  of  subjective  probability 
distributions  over  the  interpretation  of  the  signals  introduces  an  identification  problem 
and  implies  that,  in  contrast  to  the  standard  learning  environments,  asymptotic  learning 
and  asymptotic  agreement  are  not  guaranteed.  In  particular,  when  each  Fq  has  a  full 
support  for  each  9,  there  will  not  be  asymptotic  learning  or  asymptotic  agreement.  Lack 
of  asymptotic  agreement  implies  that  two  individuals  with  different  priors  observing  the 


same  sequence  of  signals  \vill  reach  different  posterior  beliefs  even  after  observing  infi- 
nitely many  signals.  Moreover,  individuals  attach  ex  ante  probability  1  that  they  will 
disagree  after  observing  the  sequence  of  signals. 

Now  consider  a  family  of  subjective  density  functions,  {/^m,}:  becoming  increasingly 
concentrated  around  a  single  point — thus  converging  to  certainty.  When  m  is  large  (and 
uncertainty  is  small),  each  individual  is  ahnost  certain  that  he  will  assign  nearly  proba- 
bility 1  to  the  true  value  of  9.  Despite  this  approximate  asymptotic  learning,  our  main 
result  shows  that  asymptotic  agreement  may  fail.  In  particular,  for  any  (P4,Pb,P^,Pb), 
we  can  construct  sequences  of  fg  .^  that  become  more  and  more  concentrated  around  pg, 
but  with  a  significant  amount  of  asymptotic  disagreement  at  almost  all  sample  paths 
for  all  m.  This  establishes  that  asymptotic  agreement  is  discontinuous  at  certainty  for 
every  model. 

Under  additional  continuity  and  uniform  convergence  assumptions  on  the  family 
{fg  ^},  we  characterize  the  faixiihes  of  subjective  densities  under  which  asymptotic  agree- 
ment is  continuous  at  certainty.  When  /g„,  and  fg^  are  concentrated  around  the  same 
p,  these  additional  assumptions  ensure  that  asymptotic  agreement  is  continuous  at  cer- 
tainty. Otherwise,  continuity  of  asymptotic  agreement  depends  on  the  tail  properties  of 
the  family  of  subjective  density  functions  {fg  .^„}.  \'VT.ien  this  family  has  regularly-varying 
tails  (such  as  the  Pareto  or  the  log-normal  distributions),  even  under  the  additional  reg- 
ularity conditions  that  ensure  uniform  convergence,  there  will  be  a  substantial  amount 
of  asymptotic  disagreement.  \Vhen  {fg^}  has  rapidly- varying  tails  (such  as  the  normal 
distribution),  asymptotic  agreement  will  be  continuous  at  certainty. 

The  intuition  for  this  result  is  as  follows.  Wlien  the  amount  of  uncertainty  is  small, 
each  individual  believes  that  he  will  learn  the  state  9,  but  he  may  still  believe  that  the 
other  individual  will  fail  to  learn.  Whether  or  not  he  believes  this  depends  on  how  an 
individual  reacts  when  a  frequency  of  signals  different  from  the  one  he  expects  wth 
"almost  certainty"  occurs.  If  this  "surprise"  event  ensures  that  the  individual  learns  9 
(as  it  does  in  the  case  of  learning  under  certainty),  then  each  individual  will  expect  the 
other  to  learn  when  the  frequency  of  signals  under  their  model  of  the  world  is  realized  and 
thus  attaches  probability  arbitrarily  close  to  1  that  they  will  asymptotically  agree.  This 
is  what  happens  when  the  family  {fg,,-,}  has  rapidly- varying  tails.  However,  when  the 
family  {/g^}  has  regularly-varying  (thick)  tails,  an  unexpected. frequencj'  of  signals  will 
prevent  the  individual  fi-om  learning  (because  he  interprets  this  as  a  possibility  likely  even 


near  certainty  due  to  the  thick  tails).  In  this  case,  each  individual  expects  the  limiting 
frequencies  to  be  consistent  with  his  model  and  the  other  individual  not  to  learn  the 
true  state  9,  and  concludes  that  there  will  be  significant  asymptotic  disagreement. 

Lack  of  asymptotic  agreement  has  important  implications  for  a  range  of  economic 
situations.  We  illustrate  some  of  these  in  a  companion  paper  by  studying  a  number  of 
simple  environments  where  two  individuals  observe  the  same  sequence  of  signals  before 
or  while  playing  a  game  (Acemoglu,  Chernozhukov  and  Yildiz,  2008). 

Our  results  cast  doubt  on  the  idea  that  the  common  prior  assumption  may  be  justified 
by  learning.  They  imply  that  in  many  environments,  even  when  there  is  little  uncertainty 
so  that  each  individual  believes  that  he  will  learn  the  true  state,  Bayesian  learning 
does  not  necessarily  imply  agreement  about  the  relevant  parameters.  Consequently, 
the  strategic  outcomes  may  be  significantly  different  from  those  in  the  common-prior 
environments.^  ^'Vhether  this  assumption  is  warranted  therefore  depends  on  the  specific 
setting  and  v\4iat  type  of  information  individuals  are  trying  to  glean  from  the  data. 

Relating  our  results  to  the  famous  Blackwell-Dubins  (1962)  theorem  may  help  clarify 
their  essence.  This  theorem  shows  that  when  two  agents  agree  on  zero-probability  events 
(i.e.,  their  priors  are  absolutely  continuous  with  respect  to  each  other),  asymptotically, 
they  will  make  the  same  predictions  about  future  frequencies  of  signals.  Our  results  do 
not  contradict  this  theorem,  since  we  impose  absolute  continuity.  Instead,  as  pointed  out 
above,  our  results  rely  on  the  fact  that  agreeing  about  future  frequencies  is  not  the  same 
as  agreeing  about  the  underlying  payoffs-relevant  variables,  because  of  the  identification 
problem  that  arises  in  the  presence  of  uncertainty.^  This  identification  problem  leads  to 
different  possible  interpretations  of  the  same  signal  sequence  by  individuals  with  diff'erent 
priors.  In  most  economic  situations,  what  is  important  is  not  future  frequencies  of  signals 
but  some  payoff-relevant  parameter.  For  example,  what  is  relevant  for  economists  trying 
to  evaluate  a  policy  is  not  the  frequency  of  estimates  on  the  effect  of  similar  policies  from 
other  researchers,  but  the  impact  of  this  specific  policy  when  (and  if)  implemented. 
Similarly,  what  may  be  relevant  in  trading  assets  is  not  the  frequency  of  information 
about  the  dividend  process,  but  the  actual  dividend  that  the  asset  will  pay.     Thus, 


^For  previous  arguments  on  whether  game-theoretic  models  should  be  formulated  with  all  individuals 
having  a  common  prior,  see,  for  example,  Aumann  (1986,  1998)  and  Gul  (1998).  Gul  (1998),  for  instance, 
questions  whether  the  common  prior  assumption  makes  sense  when  there  is  no  ex  ante  stage. 

^In  this  respect,  our  paper  is  also  related  to  Kurz  (1994,  1996),  who  considers  a  situation  in  which 
agents  agree  about  long-run  frequencies,  but  their  beliefs  fail  to  merge  because  of  the  non-stationarity 
of  the  world. 


many  situations  in  which  individuals  need  to  learn  about  a  parameter  or  state  that  will 
determine  their  ultimate  payoff  as  a  function  of  their  action  falls  wthin  the  realm  of 
the  analysis  here.  Our  main  result  shows  that  even  when  this  identification  problem  is 
neghgible  for  individual  learning,  its  imphcations  to  asymptotic  agreement  may  be  large. 

In  this  respect,  our  work  differs  from  papers,  such  as  Rreedman  (1963,  1965)  and 
Miller  and  Sanchirico  (1999),  that  question  the  applicabihty  of  the  absolute  continuity 
assumption  in  the  Blackwell-Dubins  theorem  in  statistical  and  economic  settings  (see 
also  Diaconis  and  Preedman,  1986,  Stinchcombe,  2005).  Similarly,  a  number  of  impor- 
tant theorems  in  statistics,  for  example,  Berk  (1966),  show  that  when  individuals  place 
zero  probability  on  the  true  data  generating  process,  limiting  posteriors  will  have  their 
support  on  the  set  of  all  identifiable  values  (though  they  may  fail  to  converge  to  a  limit- 
ing distribution).  Our  results  are  different  from  those  of  Berk  both  because  in  our  model 
individuals  always  place  positive  probability  on  the  truth  and  also  because  we  provide  a 
tight  characterization  of  the  conditions  for  lack  of  asymptotic  learning  and  agreement."^ 
In  addition,  neither  Berk  nor  any  other  paper  that  we  are  aware  of  investigates  whether 
asymptotic  agreement  is  continuous  at  certainty,  which  is  the  main  focus  of  our  paper. 

Our  paper  is  also  related  to  recent  independent  work  by  Cripps,  Ely,  Mailath  and 
Samuelson  (2006),  who  study  the  conditions  under  which  there  will  be  "common  learn- 
ing" by  two  agents  observing  correlated  private  signals.  Cripps,  et  al.  focus  on  a  model  in 
which  individuals  start  with  common  priors  and  then  learn  from  private  signals  under 
certainty  (though  they  note  that  their  results  could  be  extended  to  the  case  of  non- 
common  priors).  They  show  that  individual  learning  ensures  "approximate  common 
knowledge"  when  the  signal  space  is  finite,  but  not  necessarily  when  it  is  infinite.  In 
contrast,  we  focus  on  the  case  in  which  the  agents  start  with  heterogenous  pri,ors  and 
learn  from  public  signals  under  uncertxanty  or  under  approximate  certainty.  Since  all 
signals  are  public  in  our  model,  there  is  no  difficulty  in  achieving  approximate  common 
knowledge.'' 


■^In  dynamic  games,  another  source  of  non-learning  (and  tlius  lack  of  convergence  to  common  prior) 
is  that  some  subga.mcs  arc  never  visited  along  the  equilibrium  path  a.nd  thus  players  do  not  observe 
information  that  contradict  tlieir  beliefs  about  payoffs  in  these  subgames  (see,  Fudenberg  and  Levine, 
1993,  Fudenberg  and  Kreps,  1995).  Our  results  differ  from  those  in  this  literature,  since  individuals  fail 
to  learn  or  fail  to  reach  agreement  despite  tlie  fact  that  they  receive  signals  about  all  payoff-relevant 
variables. 

''Put  differently,  we  as  whether  a  pla3rer  thinks  that  the  other  player  will  learn,  whereas  Cripps  et 
al.  ask  whether  a  player  i  thinks  tliat  tlic  other  player  j  thinks  that  i  thinks  that  j  thinks  that  ...  a 
player  will  learn. 


The  rest  of  the  paper  is  organized  as  follows.  Section  2  provides  a  number  of  prelimi- 
nary results  focusing  on  the  simple  case  of  two  states  and  two  signals.  Section  3  contain 
our  main  results  at  characterizing  the  conditions  under  which  agreement  is  continuous 
at  certainty.  Section  4  provides  generalizations  of  these  results  to  an  environment  with 
K  states  and  L  >  K  signals.  Section  5  concludes,  while  the  Appendix  contains  the 
proofs  omitted  from  the  text. 

2     The  Two- State  Model  and  Preliminary  Results 
2 . 1      Environment 

We  start  wth  a  two-state  model  with  binary  signals.  This  model  is  sufficient  to  establish 
all  our  main  results  in  the  simplest  possible  setting.  These  results  are  generalized  to 
arbitrary  number  of  states  and  signal  values  in  Section  4. 

There  are  two  individuals,  denoted  by  i  =  1  and  i  =  2,  who  observe  a  sequence 
of  signals  {sf}"^Q  where  St  G  {a,b}.  The  underlying  state  is  0  G  {/I,  5},  and  agent  i 
assigns  ex  ante  probability  n'^  G  (0, 1)  to  6*  =  y4.  The  individuals  believe  that,  given 
9,  the  signals  are  exchangeable,  i.e.,  they  are  independently  and  identically  distributed 
with  an  unknown  distribution.'"'  That  is,  the  probability  of  St  =  a  given  9  =  A  is  an 
unknown  number  p.4;  likewse,  the  probabihty  of  St  =  b  given  9  =  B  is  an  unknowm 
number  pb — as  shown  in  the  following  table: 


A 

B 

a 

Pa 

1  -Pb 

b 

1  -  Pa 

Pb 

Our  main  departure  from  the  standard  models  is  that  we  allow  the  individuals  to 
be  uncertain  about  p.4  and  pe-  We  denote  the  cumulative  distribution  function  of  pg 
according  to  individual  i — namely,  his  subjective  probability  distribution — by  Fg.  In  the 
standard  models,  Fg  is  degenerate  (Dirac)  and  puts  probabihty  1  at  somepg.  In  contrast, 
for  most  of  the  analysis,  we  will  impose  the  foUowng  assumption: 


^See,  for  example,  Billingslcy  (1995).  If  there  were  only  one  state,  then  our  model  would  be  iden- 
tical to  De  Finetti's  canonical  model  (see,  for  example,  Savage,  1954).  In  the  context  of  this  model, 
De  Finetti's  theorem  provides  a  Bayesian  foundation  for  classical  probability  theory  by  showing  that 
exchangeability  (i.e.,  invariance  under  permutations  of  the  order  of  signals)  is  equivalent  to  having  an  in- 
dependent identical  unknown  distribution  and  implies  that  posteriors  converge  to  long-run  frequencies. 
De  Finetti's  decomposition  of  probability  distributions  is  extended  by  Jackson,  Kalai  and  Smorodinsky 
(1999)  to  cover  cases  without  exchangeability. 


Assumption  1  For  each  i  and  9,  Fg  has  a  continuous,  non-zero  and  finite  density  fl 
over  [0, 1]. 

The  assumption  implies  that  F^  has  full  support  over  [0, 1].  As  discussed  in  Remark  2, 
Assumption  1  is  stronger  than  necessar)'  for  our  results,  but  simplifies  the  exposition.  In 
addition,  throughout  we  assume  that  7r\  n"^ ,  Fg  and  Fg  are  known  to  both  individuals.^ 

We  consider  infinite  sequences  s  =  {sfj^'^j  of  signals  and  v.Tite  S  for  the  set  of  all 
such  sequences.  The  posterior  belief  of  individual  i  about  9  after  observing  the  first  n 
signals  {5t}";^i  is 

<t/„is)^F,^{9  =  A\{s,}l,), 

where  Pr'  (^  =  .4  |  {.S/}"^j)  denotes  the  posterior  probability  that  9  =  A  given  a  sequence 
of  signals  {5(}"^j  under  prior  tt'  and  subjective  probability  distribution  Fg.  Since  the 
sequence  of  signals,  s,  is  generated  by  an  exchangeable  process,  the  order  of  the  signals 
does  not  matter  for  the  posterior.  It  only  depends  on 

rnis)  =  #{t<  n\st  =  a]  , 

the  number  of  times  St  =  a  out  of  first  n  signals. '^  By  the  strong  law  of  large  numbers, 
Tn{s)  /n  converges  to  some  p{s)  G  [0,1]  almost  surely  according  to  both  individuals. 
Defining  the  set 

S  =  {s  ^  S  :  lim„^oo  'Tn  {s)  /n  exists}  ,  (1) 

this  observation  implies  that  Pr'  (5  G  5)  =  1  for  z  =  1,  2.  We  will  often  state  our  results 
for  all  sample  paths  s  in  5,  which  equivalently  implies  that  these  statements  are  true 
almost  surely  or  with  probability  1.  Now,  a  straightforward  apphcation  of  the  Bayes 
rule  gives 


to-' 


^Since  our  purpose  is  to  understand  whether  learning  justifies  the  common  prior  assumption,  we  do 
not  assume  a  common  prior,  allowing  agents  to  have  differing  beliefs  even  when  the  beliefs  are  commonly 
known. 

''Given  the  definition  of  r„  (s),  the  probability  distribution  Pr'  on  {.4, 1?}  x  5  is 

Pr'  (S-^-' ")     =     ^'^  /'  p'"'^)  (1  -  p)"-  ^"(■''  f],  (p)  dp,  and 

Pr' (£«■■'■")     =     (l-TT')   /    (1-p)'-"^-''p"-'-"('VMp)^P 

Jo 

at  each  event  £■*•*■"  =  {{9,s')  \s[  =  St  for  each  i  <  n},  where  s  =  {st}^-^  and  s'  =  {s[]^i. 


where  Pr'  {rn\9)  is  the  probabihty  of  obsendng  the  signal  St  =  a  exactly  r„  times  out  of 
n  signals  with  respect  to  the  distribution  Fg. 

The  followng  lemma  provides  a  useful  formula  for  0J^  (s)  =  lim„^oo  0Ji  {s)  for  all 
sample  paths  s  in  5  and  also  introduces  the  concept  of  the  asymptotic  likelihood  ra- 
tio. Both  the  formula  and  the  asymptotic  Hkelihood  ratio  are  crucial  for  our  analyses 
throughout  the  paper. 

Lemma  1  Suppose  Assumption  1  holds.   Then  for  all  s  G  S, 

c^^  {p  is))  ^  hm  0';,,  is)  =         ^J  (3) 

where  p  {s)  =  lim„^oo  r-n.  {s)  /n,  and  \/p  G  [0,  1], 


is  the  asymptotic  likelihood  ratio. 


Proof.  See  the  Appendix.    ■ 

In  equation  (4),  R'  (p)  is  the  asymptotic  likelihood  ratio  of  observing  frequency  p  of 
a  when  the  true  state  is  B  versus  when  it  is  A.  Lenmia  1  states  that,  asymptotically, 
individual  i  uses  this  hkelihood  ratio  and  Bayes  rule  to  compute  his  posterior  beliefs 
about  6. 

In  the  statements  about  learning,  without  loss  of  generality,  we  suppose  that  in  reahty 
6  =  A.  The  two  questions  of  interest  for  us  are: 

1.  Asymptotic  learning:  whether  Pr'  (limn.„>oo  0n  (■5')  =  Ij^*  =  .4)  =  1  for  i  =  1,2. 

2.  Asymptotic  agreement:  whether  Pr'  (lim„._^oo  \<pl  (s)  -  (p^  (s)|  =  O)  =  1  for  i  = 
1,2. 

Notice  that  both  asymptotic  learning  and  agreement  are  defined  in  terms  of  the 
ex  ante  probability  assessments  of  the  two  individuals.  Therefore,  asymptotic  learning 
implies  that  an  individual  beheves  that  he  or  she  will  ultimately  learn  the  truth,  while 
asymptotic  agreement  imphes  that  both  individuals  believe  that  their  assessments  will 
eventually  converge.^ 


^We  formulate  asymptotic  learning  and  agreement  in  terms  of  each  individual's  initial  probability 
measure  so  as  not  to  take  a  position  on  what  the  "objective"  for  "true"  probability  measure  is.  Under 
Assumption  1,  asymptotic  learning  and  agreement  occur  iff  the  corresponding  limits  hold  for  almost  all 
long  run  frequencies  p  [s]  €  [0, 1]  under  Lebesgue  measure,  which  has  also  an  "objective"  meaning. 
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2.2     Asymptotic  Learning  and  Agreement  with  Pull  Identifica- 
tion 

In  this  subsection,  we  provide  a  number  of  preliminary  results  on  the  conditions  under 
which  there  will  be  asymptotic  learning  and  agreement.  These  results  mil  be  used 
as  the  background  for  the  investigation  of  the  continuity  of  asymptotic  agreement  at 
certainty  in  the  next  section.  Throughout  this  subsection  we  focus  on  environments 
where  Assumption  1  does  not  hold. 

The  foUowng  result  generalizes  Savage's  (1954)  well-known  result  on  asymptotic 
learning  and  agreement.  Savage's  Theorem,  which  is  then  stated  as  Corollary  1  below, 
is  the  basis  of  the  argument  that  Bayesian  learning  toII  push  individuals  towards  common 
beliefs  and  priors.  Let  us  denote  the  support  of  a  distribution  F  by  suppF  and  define 
inf(suppF)  to  be  the  infiraum  of  the  set  suppf  (i.e.,  the  largest  p  such  that  F  (p)  =  0). 
Also  let  us  define  the  threshold  value 

log  (pb/ (1  -  j(?.4))  +  log  (p^/ (1  -  Pb)) 
(For  future  reference,  this  is  the  unique  solution  to  the  equation  p^(l— p^)  "''   = 

p\-'{i-pbY-) 

Theorem  1   (Generalized  Asymptotic  Learning  and  Agreement)  Define  p  {pa,Pb) 
as  in  (5).   Assume  that  for  each  6  and  i,  pg^,,  —  inf(suppFg)  G  (1/2, 1)  and  1  -  pB,i  7^ 
P  i.PA,j,PB,j)  +  Pa,i  for  all  i^  j.   Then  for  all  i^  j, 

1.  Pr'(lim„^^</);(.s)  =  l|6'  =  .4)  =  l,- 

2.  W  (lim„_.^  \d>\  (.s)  -  4)1  (s)|  =  O)  =  1  1}  and  only  if  1  -ps.i  <  P  {pa,j,Pb,j)  <  PA,i- 
Proof.  Both  parts  of  the  theorem  are  a  consequence  of  the  following  claim. 

Claim  1  For  any  s  E  S, 

lnn^,{s)  =  {'     ^fP(^)>P(P-^^^-P^^^  (6) 

-     -     "^  (0     if  p{s)  <  p{pA,.,PBa), 


n — >oo 


where  p  {s)  =  limr„  (5)  /n. 


(Proof  of  Claim)  Let 

"  ^  '"'       Pr^  {rn\e  =  A)       Jprn  (1  _  p)"-"  dFX  ' 
Take  any  p  >  p{pA/,,PB,i)-  Since  1  -ps,,;  <  PA,i, 

[l-PB,yp'Bf<fA,{l-PA,f-r  (7) 

The  function  pP  {I  —  p)  ~^  is  continuous  and  concave  in  p,  and  reaches  its  maximum  at 
p  —  p.  Then,  (7)  imphes  that  there  exists  e  >  0  and  p  >  paj.  such  that  for  all  p  G 
suppF^,  p  e  \pA,t,p],rn/ri  e  (p-e^p  +  e), 

(1  -  p)-  p—  <  (1  -  pB,r"  p^7"  <  f"  (1  -pf- '""  <  p-  (1  -  p)"-^"  .  (8) 

The  first  inequality  in  (8)  imphes  that 

j  {i-pY-p^-^-dF^s  <  i^-PB,rp%:r-  (q) 

On  the  other  hand,  the  last  inequality  in  (8)  implies  that 

ff-{i-pr-'^dF\>  f  f-{i-pr-'"dF\>F\{p)f^{i-pr'-,    (lo) 

where  the  first  inequahty  follows  from  non-negativity  of  p'""  (1  —  p)"~'"" .  By  dividing  the 
left-hand  side  [right-hand  side]  of  (9)  by  the  left-hand  side  [right-hand  side]  of  (10),  we 
therefore  obtain 

°-''"<'^"*-^T(?)U-/-(i-P)-'"'"  j  •  *' 

Equation  (6)  follows  from  (11).  By  (8),  when  Vn/n  G  [p  —  e/2,p  4- e/2],  the  expression 
in  parenthesis  in  (11)  is  smaller  than  1,  so  that  the  right-hand  side  converges  to  0  as 
n  — >  oo  and  Tn/n  -^  p.  Therefore,  /-^^  (r„)  — >  0,  and  thus  (pni-s)  —^  1.  The  same 
argument  (switching  A  and  B)  imphes  that  (?!)';,  (s)  -^  0  when  p  <  p{pA.i,PB,i.)-  O 

(Part  1)  Since  p^,,  =  inf(suppF0)  e  (1/2, 1),  (6)  implies  that  conditional  on  9  =  A, 
agent  i  assigns  probabihty  1  to  the  event  that  s  G  5  and  p  (.s)  >  pA^  >  p  {pA:hPB,i),  where 
the  last  inequality  follows  from  (5).  This  implies  that  Pr'  (limn^oo  ^n  i^)  =  1|^  =  -4)  =  1. 

(Part  2:  SufRciency)  We  prove  that  1  —  Pb,i.  <  p{Pa,j,Pb,j)  <  PA,i  implies  as- 
ymptotic agreement.    Suppose  p{pa,j,Pbj)  <  PA,i-    Then,  conditional  on  9  =  A,  (6) 
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implies  that  (jP^  (s)  also  converges  to  1,  and  therefore  \(t)\  (s)  —  0^  (s)|  — >  0.  Next,  when 
p{Pa,j,Pbj)  >  1  -  PB/n  conditional  on  6i  =  B,  (pi{s)  -^  0  and  0^.  (s)  -^  0.  This 
establishes  that  \(pl  (s)  —  4>l  (s)|  -^  0  and  proves  sufficiency. 

(Part  2:  Necessity)  We  prove  that  asymptotic  agreement  imphes  the  inequality 
1  —  PB,i  <  p{Paj,Pbj)  <  PA,i-  Suppose  the  inequality  does  not  hold,  and  consider  the 
case  pA,i  <  p{pA,j,PB,j)-  Then,  i  assigns  strictly  positive  probability  to  the  event  that 
rn{s)/n  ^  p{s)  e  \pA,t,pipAj,PB,j))-  But  (6)  implies  (j)'^{s)  -^  1  and  0^  (s)  -^  0,  so 
that  |(^^  (s)  —  0^  (s)|  ^  1.  Therefore,  the  beliefs  diverge  almost  surely.  The  argument 
for  the  case  where  p  {paj^Pbj)  <  1  —  PB,i  is  analogous  and  completes  the  proof  of  the 
theorem.    ■ 

Theorem  1  shows  that  under  the  "full  identification  assumption"  that  pe^i  >  1/2  for 
each  9  and  i,  asymptotic  learning  always  obtains.  Furthermore,  asymptotic  agreement 
depends  on  the  lowest  value  pe,,  of  pg  to  which  individual  i  =  1,2  assigns  positive 
probability. 

An  immediate  corollary  is  Savage's  theorem. 

Corollary  1  (Savage's  Theorem)  Assv,me  that  each  Fg  puis  pivbability  1  on  pg  for 
some  pg  >  1/2,  i.e.,  Fg  {pg)  =  1  and  Fg  [p)  =  0  for  each  p  <  pg.   Then,  for  each  i  =  1,2, 

1.  Pr'(lim„_.^<(s)  =  l|e  =  .4)  =  l. 

2.  Pr^  (lim„^oo  1 0n  (5)  -  0'  (s)  I  =  0)  =  1 . 

It  is  useful  to  spell  out  the  intuition  for  Theorem  1  and  Corollary  1.  Let  us  start 
with  the  latter.  Corollary  1  states  that  when  the  individuals  know  the  conditional 
distributions  of  the  signals  (and  hence  they  agree  what  those  distributions  are),  they 
will  learn  the  truth  with  experience  (almost  surely  as  n  — >  oo)  and  two  individuals 
observing  the  same  sequence  will  necessarily  come  to  agree  what  the  underlying  state,  6, 
is.  A  simple  intuition  for  this  result  is  that  the  underlying  state  9  is  fully  identified  from 
the  limiting  frequencies,  so  that  both  individuals  can  infer  the  underlying  state  from  the 
observation  of  the  limiting  frequencies  of  signals. 

However,  there  is  more  to  this  corollary  than  this  simple  intuition.  Each  individual 
is  sure  that  they  will  be  confronted  either  with  a  limiting  frequency  of  a  signals  equal 
to  Pa,  in  which  case  they  will  conclude  that  9  =  A,  or  they  wall  observe  a  hmiting 
frequency  of  l—pB,  and  they  will  conclude  that  9  =  B;  and  they  attach  zero  probabihty 
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to  the  events  that  they  will  observe  a  different  asymptotic  frequency.  "What  happens  if 
an  individual  observes  a  frequency  p  of  signals  diff'erent  from  pA  and  1  -  ps  in  a  large 
sample  of  size  n?  The  answer  to  this  question  will  provide  the  intuition  for  some  of 
the  results  that  we  will  present  in  the  next  section.  Observe  that  this  event  has  zero 
probabihty  under  the  individual's  beliefs  at  the  limit  n  =  cx5.  However,  for  n  <  oo  he  will 
assign  a  strictly  positive  (but  small)  probability  to  such  a  frequency  of  signals  resulting 
from  sampling  variation.  Moreover,  it  is  straightforward  to  see  that  there  exists  a  unique 
P  {Pa,Pb)  e  (1  -  Pb.Pa)  given  by  (5)  above  such  that  when  p  >  p  {pa,Pb),  the  required 
samphng  variation  that  leads  to  p  under  9  =  B  \s  infinitely  greater  (as  n  —^  oo)  than  the 
one  under  6  =  A.  Consequently,  when  p  >  p  (p.4,Pb),  the  individual  will  asymptotically 
assign  probabihty  1  to  the  event  that  0  =  A.  Conversely,  when  p  <  p{pa,Pb),  he  will 
assign  probability  1  to  9  =  B. 

The  intuition  for  Theorem  1  is  very  similar  to  that  of  Corollary  1.  The  assumption 
that  inf  (suppFj)  e  (1/2, 1)  generalizes  the  assumption  that  pg  >  1/2  in  Corohary  1,  and 
is  sufficient  to  ensure  that  asymptotically  each  individual  Avill  learn  the  payoff-relevant 
state  9,  and  also  expects  both  himself  and  the  other  player  to  do  so  before  observing 
the  sequence  of  signals.  In  particular,  similar  to  the  intuition  for  Corollary  1,  when 
individual  i  observes  a  frequency  p  E  {I  —  pB,i.PA,i)-,  he  presumes  that  this  has  resulted 
from  sampling  variation,  and  decides  whether  frequency  p  is  more  likely  under  9  =  A 
or  under  9  =  B.  In  particular,  for  each  9,  the  lowest  sampling  variation  that  leads 
to  p  is  attained  at  pg^j,  and  the  asymptotic  beliefs  depend  only  on  how  large  these 
variations  are.  When  p  >  p  iPA.i^PB,i)  (and  as  n  — >  oo)  the  necessary  sampling  variation 
is  infinitely  smaller  under  9  =  A  than  under  9  ~  B.  Consequently,  the  individual 
beheves  with  probabihty  1  that  9  —  A.  Conversely,  when  p  <  p{pA,i^PB,i),  he  beheves 
with  probability  1  that  9  —  B.  Wlrether  there  will  be  asymptotic  agreement  then  purely 
depends  on  whether  and  how  diff'erent  the  cutoff  values  p  {pa,i,Pb,i)  and  p  (p.4,2,Pb,2)  are. 
Wlien  they  are  close,  both  individuals  mil  interpret  the  limiting  frequency  of  signals,  p, 
similarly,  even  when  this  is  a  frequency  to  which  they  initially  assigned  zero  probability, 
and  wiU  reach  asymptotic  agreement.^ 

The  next  corollary  highlights  a  range  of  conditions  other  than  those  in  Corollary  1 
that,  according  to  part  2  of  Theorem  1,  are  sufficient  for  asymptotic  agreement. 


^In  contrast,  if  these  cutoff  values  were  far  apart,  so  that  p{pA,i,PB.j)  ^  (1  ~  Pb,i^Pa,i),  both  players 
would  assign  positive  probability  to  the  event  that  their  beliefs  would  diverge  to  the  extremes  and  we 


would  thus  have  lirn,,, -oo  |</'„  {^)  —  4>n  (*')|  "=  ^- 
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Corollary  2  (Sufficient  Conditions  for  Asymptotic  Agreement)  Suppose  that 
Pe^i  =  inf(suppFg)  e  (1/2, 1).  Then,  there  is  asymptotic  agreement  whenever  any  one  of 
the  following  conditions  hold: 

1.  certainty  (with  symmetry):  each  Fg  puts  probability  1  on  some  p"  >  1/2; 

2.  symmetric  support:  suppF4  =  suppFg  for  each  i; 

3.  common  support:  suppFg  =  suppF|  for  each  6. 

Proof.  Part  1  of  the  corollary  is  a  special  case  of  part  2.  Under  s}'iiimetric  support 
assumption,  we  have  p  [pa;i~Pb,i)  =  1/2  for  each  j,  so  that  part  2  of  the  corollary  follows 
from  part  2  of  Theorem  1.  Finally,  part  3  of  the  corollary  follows  from  the  fact  that 
under  the  common  support  assumption  pIpaj^Pb.j)  =  pipA^i.PB,,)  €  (1  —  PB,i,PA,i)-    ■ 

Corollary  2  shows  that  various  reasonable  conditions  ensure  asymptotic  agreement. 
Asymptotic  agreement  is  implied,  for  example,  by  certainty,  symmetric  support  or  com- 
mon support  assumptions.  In  particular,  certainty  (with  symmetry),  which  corresponds 
to  both  individuals  believing  that  hmiting  frequencies  have  to  be  p'  or  1  —  p'  (but  with 
P^  ¥"  P^)  is  sufficient  for  asymptotic  agreement.  In  tliis  case,  each  individual  is  certain 
about  what  the  limiting  frequency  will  l^e  and  therefore  believes  that  the  frequency  ex- 
pected by  the  other  individual  will  not  be  reahzed  (creating  a  discrepancy  between  that 
individual's  initial  belief  and  observation).  Nevertheless,  with  the  same  reasoning  as  in 
the  discussion  followng  Corollary  1,  each  individual  also  believes  that  the  other  individ- 
ual will  ascribe  this  discrepancy  to  sampling  variation  and  reach  the  same  conclusion  as 
himself.  This  is  sufficient  for  asymptotic  agreement. 

Theorem  1  and  Corollary  2  therefore  show  that  results  on  asymptotic  learning  and 
agreement  are  substantially  more  general  than  Savage's  original  theorem.  Nevertheless, 
these  results  do  rely  on  the  feature  that  F^J  (1/2)  =  0  for  each  i  =  1,2  and  each  9 
(thus  implicitly  imposing  that  Assumption  1  does  not  hold).  This  feature  implies  that 
both  individuals  attach  zero  probability  to  a  range  of  possible  models  of  the  world — i.e., 
they  are  certain  that  pe  cannot  be  less  than  1/2.  There  are  two  reasons  for  considering 
situations  in  which  this  is  not  the  case.  First,  the  preceding  discussion  illustrates  why 
assigning  zero  probability  to  certain  models  of  the  world  is  important;  it  enables  individ- 
uals to  ascribe  any  frequency  of  signals  that  are  unlikely  under  these  models  to  sampling 
variabiUty.  This  kind  of  inference  may  be  viewed  as  somewhat  unreasonable,  since  indi- 
viduals are  reaching  very  strong  conclusions  based  on  events  that  have  vanishingly  small 
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probabilities  (since  sampling  variability  vanishes  as  n  ^  cxd).  Second,  our  motivation  of 
investigating  learning  under  uncertainty  suggests  that  individuals  may  attach  positive 
(albeit  small)  probabilities  to  all  possible  values  of  pe.  This  latter  feature  is  the  essence 
of  Assumption  1  (the  "full  support"  requirement). 

2.3      Failure  of  Asymptotic  Learning  and  Agreement  with  Full 
Support 

We  next  impose  Assumption  1  and  show  that  under  the  more  general  circumstances 
where  Fg  has  full  support,  there  will  be  neither  asymptotic  learning  nor  asymptotic 
agreement. 

Theorem  2   (Lack  of  Asymptotic  Learning  and  Agreement)  Under  Assumption 

1.  Pr'  (lim„_^  (/>;  [s)  ^  1|^  =  .4)  =  1  for  i  =  1,2; 

2.  Pr*  (lim„_oo  \<PI,.  (-s)  ~  0n  {s)\  t^  O)  =  1  whenever  n'^  ^  tt^  and  Fg  =  Fg  for  each 
ee  {A,B}. 

Proof.  Since  f's{l-p{s))  >  0  and  fA{p{s))  is  finite,  R'{p{s))  >  0.  Hence,  by 
Lemma  1,  (p]^  [p  (s))  ^  1  for  each  s,  establishing  the  first  part.  To  see  the  second  part, 
note  that,  by  Lemma  1,  for  any  s  E  S, 

<Pl  (pis))  =  <iL  iPis))  if  and  only  if  ^^R'  ip{s))  =  ^^^'  (p(*0)  ■  (12) 

Since  vr^  7^  tt^  and  Fg  =  Fg,  this  implies  that  for  each  s  G  S,  (f)]^  (s)  7^  0^  (s),  and  thus 
Pr'  {\(t>l  (s)  -<pI{s)\  ^0)  =  1  for  i  =  1,2.    m 

Remark  1  The  assumption  that  Fg  —  Fg  in  this  theorem  is  adopted  for  simplicity. 
We  can  see  from  (12)  that  even  in  the  absence  of  this  condition,  there  mil  typically  be 
no  asjTuptotic  agreement.  Theorem  6  in  Section  4  states  a  more  general  version  of  this 
result  for  the  case  of  multidimensional  state  and  signals,  and  shows  how  the  assumption 
that  Fg  =  Fg  can  be  relaxed  sigTiificantly. 

Remark  2  Assumption  1  is  considerably  stronger  than  the  necessary  conditions  for 
Theorem  2.  It  is  adopted  only  for  simplicity.  It  can  be  verified  that  for  lack  of  as- 
ymptotic learning  it  is  sufficient  (but  not  necessary)  that  the  measures  generated  by 
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the  distribution  functions  F\  {p)  and  F^  (1  -p)  be  absolutely  continuous  with  respect 
to  each  other.  Similarly,  for  lack  of  asymptotic  agreement,  it  is  sufficient  (but  not  nec- 
essary) that  the  measures  generated  by  F\  {p),  F^  (1  -p),  Fj  (p)  and  F^  (1  -  p)  be 
absolutely  continuous  with  respect  each  other.  For  example,  if  both  individuals  believe 
that  Pa  is  either  0.3  or  0.7  (with  the  latter  receiving  greater  probabihty)  and  that  ps  is 
also  either  0.3  or  0.7  (with  the  former  receiving  greater  probabihty),  then  there  wll  be 
neither  asymptotic  learning  nor  asymptotic  agreement.  Throughout  we  use  Assumption 
1  both  because  it  simplifies  the  notation  and  because  it  is  a  natural  assumption  when 
we  turn  to  the  anal5^sis  of  asymptotic  agreement  as  the  amount  of  uncertainty  vanishes. 

Theorem  2  contrasts  with  Theorem  1  and  imphes  that,  with  probabihty  1,  each 
individual  will  fail  to  learn  the  true  state.  The  second  part  of  the  theorem  states  that 
if  the  individuals'  prior  behefs  about  the  state  differ  (but  they  interpret  the  signals  in 
the  same  way),  then  their  posteriors  will  eventually  disagree,  and  moreover,  they  will 
both  attach  probabihty  1  to  the  event  that  their  beliefs  will  eventually  diverge.  Put 
differently,  this  implies  that  there  is  "agreement  to  eventually  disagree"  between  the  two 
individuals,  in  the  sense  that  they  both  believe  ex  ante  that  after  observing  the  signals 
they  will  fail  to  agree. 

Intuitively,  when  Assumption  1  (in  particular,  the  full  support  feature)  holds,  an 
individual  is  never  sure  about  the  exact  interpretation  of  the  sequence  of  signals  he 
observes  and  will  update  his  views  about  pg  (the  informativeness  of  the  signals)  as  well 
as  his  views  about  the  underlying  state.  For  example,  even  when  signal  a  is  more  likely 
in  state  A  than  in  state  B,  a  very  high  frequency  of  a  will  not  necessarily  convince  him 
that  the  true  state  is  A,  because  he  may  infer  that  the  signals  are  not  as  reliable  as  he 
initially  beheved,  and  they  may  instead  be  biased  towards  a.  Therefore,  the  indi\'idual 
never  becomes  certain  about  the  state,  which  is  captured  by  the  fact  that  R  {p)  defined 
in  (4)  never  takes  the  value  zero  or  infinity.  Consequently,  as  shown  in  (3),  his  posterior 
beliefs  will  be  determined  by  his  prior  beliefs  about  the  state  and  also  by  i?.',  which  tells 
us  how  the  individual  updates  his  beliefs  about  the  informativeness  of  the  signals  as  he 
observes  the  signals.  When  two  individuals  interpret  the  informativeness  of  the  signals 
in  the  same  way  (i.e.,  B}  =  B?),  the  differences  in  their  priors  will  always  be  reflected 
in  their  posteriors. 

In  contrast,  if  an  individual  were  certain  about  the  informativeness  of  the  signals 
(i.e.,  if  i  were  sure  that  pg  =  p'g  for  some  p'g  >  1 12)  as  in  Theorem  1  and  Corollaiy  2, 
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then  he  would  neA^er  question  the  inforraativeness  of  the  signals,  even  when  the  limiting 
frequency  of  a  converges  to  a  value  different  from  p'^  or  1  -  pg,  and  would  interpret 
such  discrepancies  as  resulting  from  sampling  variation.  This  would  be  sufficient  for 
asymptotic  agreement  when  p\  =  p)^.  The  full  support  assumption  in  Assumption  1 
prevents  this  type  of  reasoning  and  ensures  asymptotic  disagreement. 

3     Main  Results 

In  this  section,  we  present  our  main  results  concerning  the  potential  discontinuity  of 
asymptotic  agreement  at  certainty.  More  precisely,  we  investigate  whether  as  the  amount 
of  uncertainty  about  the  interpretation  of  the  signals  disappears  and  we  recover  the 
standard  model  of  learning  under  certainty,  the  amount  of  asymptotic  disagreement 
vanishes  continuously.  We  will  show  that  this  is  not  the  case,  so  that  one  can  perturb 
the  standard  model  of  learning  under  certainty  sightly  and  obtain  a  model  in  which 
there  is  substantial  asymptotic  disagreement.  We  first  show  that  asymptotic  agreement 
is  discontinuous  at  certainty  in  every  model,  including  the  canonical  model  of  learning 
under  certainty,  where  both  individuals  share  the  same  beliefs  regarding  the  conditional 
signal  distributions  (Theorem  3).  We  then  restrict  our  perturbations  to  a  class  that 
embodies  strong  continuity  and  uniform  convergence  assumptions.  Within  this  class  of 
perturbations,  we  characterize  the  conditions  under  which  asymptotic  agreement  will  be 
continuous  at  certainty  (Theorem  5). 

For  any  p  6  [0, 1],  write  S.^  for  the  Dirac  distribution  that  puts  probability  1  on  p  =  p; 
i.e.,  5p  (p)  =  1  if  p  >  p  and  Sp  (p)  =  0  otherwise. 

Let  {i^(jm}m6N,!:sA',ee0  {{Fg.,„}  for  short)  denote  an  arbitrary  sequence  of  subjective 
probability  distributions  converging  to  a  Dirac  distribution  6^,  for  each  (i,  ^)  as  m  — >  oo: 


hm^l^e^miP)  -  <^  0    ifp<2->^,. 


(13) 


(We  will  simply  say  that  {Fg^}  converges  to  Sp,J.  Throughout  it  is  impHcitly  assumed 
that  there  is  asymptotic  agreement  under  S^i  (as  in  Corollaries  1  and  2).  Therefore,  as 
m  -^  cx),  uncertainty  about  the  interpretation  of  the  signals  disappears  and  we  converge 
to  a  world  of  asymptotic  agreement.  We  wTite  Pi'"'  for  the  ex  ante  probabihty  under 
{F\  ^,  Fg  „)  and  0j,o,n7  for  the  asymptotic  posterior  belief  that  6  =  A  under  iF\^^,  -^e.m)- 
Evidently,  as  {Fg„, }  converges  to  S^,, ,  each  individual  becomes  increasingly  convinced 
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that  he  will  learn  the  true  state,  so  that  learning  is  continuous  at  certainty.     More 
formally,  for  all  e  >  0, 

hm  Pv''^{4>'^^,,>l-e\e  =  A)=l. 

This  imphes  that  when  a  model  of  learning  under  certainty  is  perturbed,  deviations 
from  full  learning  will  be  small  and  each  individual  will  attach  a  probability  arbitrarily 
close  to  1  that  he  will  eventually  learn  the  payoff-relevant  state  variable  6.  We  next 
define  the  continuity  of  asymptotic  agreement  at  certainty. 

Definition  1  For  any  given  family  {F^^},   we  sa,y  that  asymptotic  agreement  is 
continuous  at  certainty  under  {Fg^^,},  if  for  all  e  >  0  and  for  each  z  =  1,2, 

limPr'>'"(|(/)^,,„.-0^,„,|<5)  =  l. 

We  say  that  asymptotic  agreement  is  continuous  at  certainty  at  {Pa^Pb^P%P%) 
if  it  is  continuous  at  certainty  under  every  family  {-Pg^}  converging  to  (5p. . 

Thus,  continuity  at  certainty  requires  that  as  the  family  of  subjective  probabihty 
distributions  converge  to  a  Dirac  distribution  (at  which  there  is  asymptotic  agreement), 
the  ex  ante  probability  that  both  individuals  assign  to  the  event  that  they  will  agree 
asymptotically  becomes  arbitrarily  close  to  1.  Hence,  asymptotic  agreement  is  discon- 
tinuous at  certainty  at  {Pa^Pb>p'a-p'b)  i^  there  exists  a  family  {Fg^}  converging  to  5 pi 
and  e  >  0  such  that 

hm  Pr'>"(|0L,..-<*L,^,|>£)>O 

771. — 'CxD 

for  i  =  1,  2.  We  wll  next  define  a  stronger  notion  of  discontinuity. 

Definition  2    We  say  that  asymptotic  agreement  is  strongly  discontinuous  at 
certainty  under  {Fg^}  if  there  exists  £  >  0  such  thai 

lim  Pr'-"'(|0^,,„,-0^,„|>e)=l 

for  i  =  1,2.     We  say  that  asymptotic  agreement  is  strongly  discontinuous  at 

certainty  at  (p^ , pg , p^^ , p|)  if  it  is  strongly  discontinuous  at  certainty  under  some 
family  {F^^}  converging  to  6.^^ 

Strong  discontinuity  requires  that  even  as  we  approach  the  world  of  learning  un- 
der certainty,  asymptotic  agreement  will  fail  with  probability  approximately  equal  to  1 
according  to  both  individuals. 
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3.1      Discontinuity  of  Asymptotic  Agreement 

The  next  theorem  estabUshes  the  strong  discontinuity  of  asymptotic  agreement  at  cer- 
tainty. 

Theorem  3  (Strong  Discontinuity  of  Asymptotic  Agreement)  Asymptotic  agree- 
ment is  strongly  discontinuous  at  every  {Pa^P^b^ Palp's)  '^''i'ih  p'g  e  (1/2,1)  for  all  {d,i). 
Moreover,  if  tt^  ^  tt' ,  then  there  exist  {Fg}  converging  to  5^1   and  Z  >  0  such  that 

\<plc,m  [P  ('S))  -  <^L,m  (P  (^0)1  >  Z  for  allmGN  and  s  E  S. 

The  proof  of  this  theorem  is  provided  below.  Note  that  when  pi  =  pj  —  pe  for 
each  9,  the  hmiting  world  is  the  canonical  learning  model  (under  certainty)  described 
in  Savage's  Theorem  (Corollary  1):  both  individuals  are  certain  that  the  probability 
of  observing  signal  s  =  a  is  p^  >  1/2  if  the  state  is  9  =  A  and  I  —  ps  if  the  state  is 
9  =  B  (i.e.,  each  Fg  puts  probabihty  1  on  pg).  Therefore,  this  theorem  establishes  strong 
discontinuity  at  certainty  for  the  canonical  learning  model;  even  when  we  are  arbitrarily 
close  to  this  world  of  certainty,  the  asymptotic  gap  in  beliefs  is  bounded  away  from  zero. 
The  condition  p^  G  (1/2, 1)  is  not  needed  (see  Theorem  7  below).  The  proof  is  based  on 
the  following  example. 

Excunple  1  For  some  small  e,  A  G  (0, 1),  each  individual  i  thinks  that  with  probability 
1  —  e,  Pe  is  in  a  A-neighborhood  of  some  pl  >  (1  +  A)  /2,  but  with  probabihty  e,  the 
signals  are  not  informative.    More  precisely,  for  pg  >  (1  +  A) /2  and  A  <  iPl—pj],  we 

have 

f  f^)  ^  /  f  +  (1  -  e)  /^    'fP  e  (Pe  -  V2,P^  +  A/2) 

■'^^^  ^       \  e  otherwise  ^     ' 

for  each  6  and  i.  Now,  by  (4),  the  asymptotic  likelihood  ratio  is 

^     ifp{s)ED\^{p\-X/2^p\+\/2) 

^'^^^'^^^^     — '     z/>(.)  G  Z?i,  ^  (1 -p'i, -A/2,1- Pi,  +  A/2) 

otherwise. 

This  and  other  relevant  functions  are  plotted  in  Figure  1  for  e  — >  0,  A  ^  0.  The 
likelihood  ratio  /?'  {[>  (.s))  is  1  wlien  p  (s)  is  small,  takes  a  very  high  value  at  1  —  Pg,  goes 
down  to  1  afterwards,  becomes  nearly  zero  around  p'_^,  and  then  jumps  back  to  1.  By 
Lemma  1,  ^^  (s)  will  also  be  non-monotone:   when  p[s)  is  small,  the  signals  are  not 
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Figure  1:  The  three  panels  show,  respectively,  the  approximate  values  of  /?,'  (p),  (j!)^, 
\4L  -  0LI  as  e  ^  0,  for  f)\  =  p^  =  p'\ 


and 


informative,  thus  (/)J^  (s)  is  the  same  as  the  prior,  tt'.  In  contrast,  around  1  —  p^,  the 
signals  become  very  informative  suggesting  that  the  state  is  B,  thus  0'J^  (s)  =  0.  After 
this  point,  the  signals  become  uninformative  again  and  0!^  (s)  goes  back  to  tt*.  Around 
p^,  the  signals  are  again  informative,  but  this  time  favoring  state  A,  so  (p^  (s)  =  1. 
Finally,  signals  again  become  uninformative  and  0'^^  (s)  falls  back  to  tt''.  Intuitively, 
when  p(5)  is  around  1  —  pg  or  p'4,  the  individual  assigns  very  high  probabiUty  to  the 
true  state,  but  outside  of  this  region,  he  sticks  to  his  prior,  concluding  that  the  signals 
are  not  informative. 

The  first  important  observation  is  that  even  though  0^  is  equal  to  the  prior  for  a  large 
range  of  limiting  frequencies,  as  f  -^  0  and  A  — >  0  each  individual  attaches  probability 
1  to  the  event  that  he  will  learn  6.  This  is  because  as  illustrated  by  the  discussion  after 
Theorem  1,  as  e  -^  0  and  A  — ^  0,  each  individual  becomes  convinced  that  the  Umiting 
frequencies  will  be  either  1  —  pg  or  p\. 

However,  asymptotic  learning  is  considerably  weaker  than  as3aTiptotic  agreement. 
Each  individual  also  understands  that  since  A  <  |pj  —  Pq|,  when  the  long-run  frequency 
is  in  a  region  where  he  learns  that  9  =  A,  the  other  individual  will  conclude  that  the 
signals  are  uninformative  and  adhere  to  his  prior  belief.  Consequently,  he  expects  the 
posterior  beliefs  of  the  other  individual  to  be  always  far  from  his.  Put  differently,  as 
e  -^  0  and  A  -^  0,  each  individual  beheves  that  he  will  learn  the  value  of  9  himself  but 
that  the  other  individual  will  fail  to  learn,  thus  attaches  probability  1  to  the  event  that 
they  disagree.  This  can  be  seen  from  the  third  panel  of  Figure  1;  at  each  sample  path 
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in  S,  at  least  one  of  the  individuals  will  fail  to  learn,  and  the  difference  between  their 
limiting  posteriors  will  be  uniformly  higher  than  the  following  "objective"  bound 

z  =  min  {7r\  tt^,  1  —  7r\  1  —  tt^,  \Tr'^  —  7r"|}  . 

When  TT^  =  1/3  and  tt^  =  2/3,  this  bound  is  equal  to  1/3.  In  fact,  the  belief  of 
each  individual  regarding  potential  disagreement  can  be  greater  than  this;  each  indi- 
vidual beMeves  that  he  will  learn  but  the  other  individual  will  fail  to  do  so.  Conse- 
quently, for  each  -i,  Pr'  (|(^^  (s)  -  qr^  {s)\  >  Z)  >  1  -  e,  where  as  e  — »  0,  Z  — >  z  = 
min{7r\7r^,  1  —  7r\  1  -  tt^}.  This  "subjective"  bound  can  be  as  high  as  1/2. 

Proof  of  Theorem  3.  We  only  consider  the  case  p]  >  pj  for  6  =  A,  B;  the 
other  cases  are  identical.  In  Example  1,  for  each  7/7.,  take  e  =  A  =  e/m,,  p],  =  pi  +  A, 
and  Pg  =  pI  ~  )^  where  e  is  such  that  1  —  (^^  (s)  <  (1  —  7r-')/2  for  p{s)  e  D\  and 
0^  (s)  <  1x^/2  for  p{s)  E  I?e  whenever  e  =  A  <  e.  Such  e  exists  (by  asymptotic 
learning  of  i).  By  construction,  each  Fq'^'  converges  to  5p. ,  and  \p]j  —  p1\  >  A  for  each 
6.  To  complete  the  proof,  pick  Z  =  z/2  >  0.  By  choice  of  e,  |0^,„  (s)  —  ^L.m  i-^)\  >  ^ 
whenever  p  (s)  G  D\  U  D'g.  But  Pr''-'"  {p  (s)  £  D\  U  D'^)  =  e  (1  -  A),  which  goes  to  1  as 
m  -^  00.  Therefore, 

hm  Pr^''"{Ul^„,-ct>lJ>Z)  =  l.  (15) 


TD— >00 


To  prove  the  last  statement  in  the  theorem,  pick  Z  —  z/2,  which  is  positive  when 

In  the  example  (and  thus  in  the  proof  of  Theorem  3),  the  likelihood  ratio  /?'  (p(s)) 
and  the  asymptotic  beliefs  (j!)'^  (s)  are  non-monotone  in  the  frequency  p{s).  This  is  a 
natural  outcome  of  uncertainty  on  conditional  signal  distributions  (see  the  discussion  at 
the  end  of  Section  2  and  Figure  2  below).  When  R  is  monotone  and  the  amount  of 
uncertainty  is  small,  at  each  state  one  of  the  individuals  assigns  high  probability  that 
both  of  them  will  learn  the  true  state  and  consequently  asymptotic  disagreement  will  be 
small.  Nevertheless,  asymptotic  agreement  is  still  discontinuous  at  uncertainty  when  we 
impose  the  monotone  likelihood  ratio  property.  This  is  sho^vn  in  the  next  theorem. 

Theorem  4   (Discontinuity  of  Asymptotic  Agreement  under  Monotonicity) 

For  any  p\,p''g  >  1/2,  i  G  {1,2},  mid  7r\  tt'^  G  (0,1).  there  exist  a  family  {Fg^^}  and 
Z  >  0  such  that: 
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1.  for  each  9  G  {A,  B}  and  i  =  1,2,  Fg^  converges  to  S^i  ; 

2.  the  likelihood  ratio  R^  {p)  is  nonincreasing  in  p  for  each  i  and  rn,  and 

3.  for  each  i, 


lim  Pr'^''"(kU-Cn.|>^)>0.  (16) 


Proof.  See  the  Appendix.    ■ 

The  monotonicity  of  the  hkehhood  ratio  has  weakened  the  conclusion  of  Theorem 
3,  so  that  the  hmit  in  (16)  is  no  longer  equal  to  1,  so  that  asymptotic  agreement  is 
discontinuous  at  certainty,  but  not  strongty  so. 

Note  that  in  Theorems  3  and  4  the  famihes  {F^^,}  leading  to  the  discontinuity  of 
asymptotic  agreement  induce  discontinuous  likehhood  ratios.  This  is  not  crucial  for  the 
results,  however,  since  smooth  approximations  to  Fg^  would  ensure  continuity  of  the 
likelihood  ratios  as  well.  What  is  important  is  that  the  likelihood  ratios  under  families 
{■^9m}  ^^^'^  ^'^^  converge  uniformly  (instead,  convergence  is  pointwise).  We  next  impose 
a  uniform  convergence  assumption  (as  well  as  additional  strong  continuity  assumptions) 
and  characterize  the  conditions  for  discontinuity  of  asymptotic  agreement  at  certainty. 

3.2      Agreement  and  Disagreement  with  Uniform  Convergence 

In  this  subsection,  we  consider  a  class  of  famihes  [Fg-^]  converging  uniformly  to  the 
Dirac  distribution  5pr  for  some  p'  G  (1/2, 1)  and  show  that  whether  there  is  discontinuity 
of  asymptotic  agreement  at  certainty  depends  on  the  tail  ■properties  of  {Fg^}. 

We  start  our  analysis  by  defining  the  family  {Fg.^},  with  a  corresponding  family 
of  subjective  probabihty  density  functions  [fl^,]-  The  family  is  parameterized  by  a 
determining  density  function  /.  We  impose  the  following  conditions  on  /: 

(i)  /  is  strictly  positive  and  symmetric  around  zero; 

(ii)  there  exists  x  <  oo  such  that  /  (.x)  is  decreasing  for  all  x  >  x\ 

(iii) 

^,(x,y)=  lim  4^4  (17) 

m^oo  /  [my) 

exists  in  [0,  oo]  at  ah  {x.y)  G  M+. 
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Conditions  (i)  and  (ii)  are  natural  and  serve  to  simplify  the  notation.  Condition  (iii) 
introduces  the  function  R{i:,y),  which  will  arise  naturally  in  the  study  of  asymptotic 
agreement  and  has  a  natural  meaning  in  asymptotic  statistics  (see  Definitions  1  and  2 
below). 

In  order  to  vary  the  amount  of  uncertainty,  we  consider  mappings  of  the  form  x  i-^- 
[x  —  y)  /m,  which  scale  down  the  real  line  around  y  by  the  factor  1/m.  The  family  of 
subjective  densities  for  individuals'  beliefs  about  pa  and  pe,  {flm}^  ^^'^^^  be  determined 
by  /  and  the  transformation  x  ^-^  [x  —  p')  /m.'"  In  particular,  we  consider  the  following 
family  of  densities 

fl,^,{:p)  =  e{m)f[m{p-p^))  (18) 

for  each  6  and  i  where  d  (m)  =  1/  J^  /'  [m  {p  —  p'))  dp  is  a  correction  factor  to  ensure 
that  /g^  is  a  proper  probability  density  function  on  [0, 1]  for  each  m.  In  this  family  of 
subjective  densities,  the  uncertainty  about  pa  is  scaled  down  by  l/'m,  and  fg  ^  converges 
to  the  Dirac  distribution  dpi  as  rn  ^  oo,  so  that  individual  i  becomes  sure  about  the 
informativeness  of  the  signals  in  the  limit. 

The  next  theorem  characterizes  the  class  of  determining  functions  /  for  which  the 
resulting  family  of  the  subjective  densities  {J}  ,^„}  leads  to  approximate  asymptotic  agree- 
ment as  the  amount  of  uncertainty  vanishes. 

Theorem  5  (Characterization)  Consider  the  family  {F^.m}  defined  in  (18)  for  some 
p"  >  1/2  and  f,  satisfying  conditions  (i)-(m)  above.  Assume  that  f{mx)/f{my)  uni- 
formly converges  to  R{x,y)  over  a  neighborhood  of  {p^  -|-p^  —  1,  \p^  —  ]5^|). 

1.  If  R{p^  +  p^  —  l,\p^  ~  p'^l)  =  0;  ihen  agreement  is  continuous  at  certainty  under 

.    in, J- 

2.  If  R{p^  +  p'^  —  l,\p^  —  p'^l)  7^  0,  then  agreement  is  strongly  discontinuous  at  cer- 
tainty under  {Fg.^}. 

Proof.  Both  parts  of  the  theorem  are  proved  using  the  following  claim. 

Claim  2  Iim.^,_oo  (0^.,™  if)  "  ^o..n,.  (p'))  =  0  if  and  only  if  R  {f  +  p^  -  h\p' -  p^\)  = 
0  (where  (p'oom  i'P')  denotes  beliefs  evaluated  under  sample  paths  with  p  =  f ). 


^"This  formulation  assumes  that  p'^  a.ncl  Pg  arc  cqu.il.  Wc  can  easily  assume  these  to  be  different,  but 
do  not  introduce  this  generality  here  to  simplify  the  exposition.  Theorem  8  allows  for  such  differences 
in  the  context  of  the  more  general  model  with  multiple  states  and  multiple  signals. 
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(Proof  of  Claim)  Let  i?^,  (p)  be  the  asymptotic  likelihood  ratio  as  defined  in  (4) 
associated  with  subjective  density  /g^.  One  can  easily  check  that  limm^oo  Rm  (p')  =  0. 
Hence,  by  (12),  lim,„._>oo  (0^,m  iP')  -  (P^ocm  iP'))  =  0  if  and  only  if  hm„,^«,i?4  (p^)  =  0. 
By  definition, 

hm  Rl„  [p  )     =      hm  — - — — —— - 

m^oo  m-»c»        j['ni[p     —p)) 

=    R{l~p'-^f.,f-f) 

where  the  last  equality  follows  by  condition  (i),  the  symmetry  of  the  function  /.   This 

establishes  that  hm^^^oo  Rm  (p')  =  0  (and  thus  hm„j_oo  {4>'oo,m.  (p')  -  4>io,m  (p'))  =  0)  if 
and  only  if  R  (p^  +  p^  -  1,  \p^  -p^\)  =0.  D 

(Proof  of  Part  1)  Talce  any  e  >  0  and  (5  >  0,  and  assume  that  ^  (p^  +  p^  —  1,  |p^  —  p^|)  = 
0.  We  will  show  that  there  exists  rfi.  G  N  such  that 

Pr'  f  hm  \<P'      is)  -  4>lm  {s)\>e)  <6         (Vm  >m,i  =  1,  2). 


■     By  Lemma  1,  there  exists  e'  >  0  such  that  0^  .„  {p  (s))  >  1  —  f  whenever  i?'  (p  (s))  <  e'. 
There  also  exists  xq  such  that 

Pr^  (p  (s)  G  {f  -  Xo/m.,p'  +  xo/m)  \0  =  A)  =   f  °  f  (x)  dx  >  1  -  5.  (19) 

J  ~X0 

Let  K  =  min-cg[_.CQ^,;(,]  /  (x)  >  0.  Since  /  monotonically  decreases  to  zero  in  the  tails 
(see  (ii)  above),  there  exists  Xi  such  that  /  (x)  <  e'n  whenever  |x|  >  \xi\.  Let  mi  = 
(xo  +  xi)  I  (2p'  —  1)  >  0.  Then,  for  any  m  >  mi  and  p  (s)  G  (p'  —  xq/tii.p'  +  Xo/m),  we 
have  |p(5)  —  1  +p'|  >  Xi/rn,  and  hence 

/(m(p(5)-p'))  K 

Therefore,  for  all  ni  >  mi  and  p  (s)  G  (p'  —  Xo/m^p''  +  Xo/m.),  we  have  that 

Again,  by  Lemma  1,  there  exists  e"  >  0  such  that  <J^m,  (p(s))  >  1  —  e  whenever 
i?4  (p  (s))  <  e".  Now,  for  each  p  (,s), 

hm  /?4(p(,))  =  ^(p(,s)+p'-l,|p(5)-]y|).  (21) 
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Moreover,  by  the  uniform  convergence  assumption,  there  exists  r/  >  0  such  that  i?^  (p  (s)) 
uniformly  converges  to  ^  (/o(s) +it>' —  1,  |p(s)  —  p' I)  on  (p' —  ry,p' +  r/)  and 

R{p{s)+p=-l,\p{s)-^y\)  <e'72 

for  each  p{s)  in  (p'  —  T],p'^  +  ri).  Moreover,  uniform  convergence  also  implies  that  R 
is  continuous  at  (p^  +p^  —  1,  [p^  —  p^\)  (and  in  this  part  of  the  proof,  by  hj^pothesis, 
it  takes  the  value  0).  Hence,  there  exists  m2  <  oo  such  that  for  aU  m-  >  m2  and 
p(s)  e  (p'-r/,p' +77), 

i?4,(p(.s))  <R{p{s)+fP  -  l^\p{s)-jV\)+e"/2  <  e". 

Therefore,  for  all  m  >  7712  and  p  {s)  £  [ff  —  i],p'  +  7/),  we  have 

C,,Jp(s))>l-e.  (22) 

Set  m  =  ma.x{mi,m.2,ri/xo}.  Then,  by  (20)  and  (22),  for  any  777.  >  fh  and  p{s)  e 
{p'-Xo/m,f  +  Xo/Tn),  we  have  \4'';^jn  (P  (s))  -  (tL.rn  {p  (s))\  <  e.  Then,  (19)  implies 
that  Pr^  (|(/.j,„,„  (p(5))  -  <^i,,„  (p(s))|  <  eie-  =  /I)  >  1  -  <5.  By  the  symmetry  of  A  and 
B,  this  establishes  that  Pr''  (10'^^  (p  (s))  -  (^  „  (p  (s))  |  <  e)  >  1  -  5  for  m  >  tti. 

(Proof  of  Part  2)  We  will  find  f  >  0  such  that  for  each  5  >  0,  there  exists  fh  eN 
such  that 

Pr'  (  lim  |(^,;,,,„  (.s)  -  0^     (s)  I  >  e)  >  1  -  J         (V7T7  >  777,,  ^  =  1,  2). 

Since  lim^^o,  i?^,,  {f)  =  R{p^+p^~  1,  \f  -  f~\)  >  0,  linv^.^  ,^^_„  (f)  <  1.  We  set 
e  =  (1  —  hmm,--^oo  0io  m  (P'))  /-^  ^'^^  ^^"^  similar  arguments  to  those  in  the  proof  of  Part 
1  to  obtain  the  desired  conclusion.    ■ 

The  main  assumption  in  Theorem  5  is  that  the  likelihood  ratios  /?'„  [pis))  converge 
uniformly  to  a  Hmiting  likehhood  ratio,  given  by  R}^  In  what  follows,  we  say  that 
"noise  vanishes  uniformly"  as  a  shorthand  for  the  statement  that  the  hkehhood  ratio 
/?^(p(s))  converges  uniformly  to  the  limiting  likelihood  ratio.  Theorem  5  provides  a 
complete  characterization  of  the  conditions  for  the  continuity  of  asymptotic  agreement  at 
certainty  under  this  uniform  convergence  assumption.  In  particular,  this  theorem  shows 
that  even  when  the  likelihood  ratios  converge  uniformly,  asymptotic  agreement  may  fail. 


^^Note  that  the  limiting  hkclihood  ratio  ft  is  not  related  to  the  likelihood  ratio  that  applies  in  the 
( "limiting" )  model  without  uncertainty. 
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In  contrast  Corollary  2  shows  that  that  there  will  always  be  asymptotic  agreement  in 
the  limit. 

The  theorem  provides  a  simple  condition  on  the  tail  of  the  distribution  /  that  de- 
termines whether  the  asymptotic  difference  between  the  posteriors  will  be  small  as  the 
amount  of  uncertainty  concerning  the  conditional  distribution  of  signals  vanishes  "uni- 
formly". This  condition  can  be  expressed  as: 

R{x,y)=   lim  (^  =  0  (23) 

m.^oo  f  [my] 

where  x  =  p^  +  p'^  -  1  >  |p^  —  p^|  =  y.  The  theorem  shows  that  if  this  condition  is 
satisfied,  then  as  uncertainty  about  the  informativeness  of  the  signals  disappears  the 
difference  between  the  posteriors  of  the  two  individuals  will  become  neghgible.  Notice 
that  condition  (23)  is  symmetric  and  does  not  depend  on  i. 

Intuitively,  condition  (23)  is  related  to  the  beliefs  of  one  individual  on  whether  the 
other  individual  will  learn.  As  the  amount  of  uncertainty  concerning  the  conditional 
distributions  vanishes,  we  always  have  that  lim„,^oo /?.'„  (p')  =  0,  so  that  each  agent 
believes  that  he  will  learn  the  value  of  0  with  probability  1.  Asymptotic  agreement  (or 
lack  thereof)  depends  on  whether  he  also  believes  the  other  individual  mil  learn  the 
value  of  9.  When  R{x,y)  =  0,  an  individual  who  expects  a  limiting  frequency  of  p^  in 
the  asymptotic  distribution  will  still  learn  the  true  state  when  the  limiting  frequency  is 
p^.  Therefore,  individual  1,  who  is  almost  certain  that  the  limiting  frequency  will  be 
p^,  still  believes  that  individual  2  will  reach  the  same  inference  as  himself.  In  contrast, 
when  R{x,y)  ^  0,  individual  1  is  still  certain  that  Umiting  frequency  of  signals  will  be 
p^  and  thus  expects  to  learn  himself.  However,  he  understands  that,  when  R  {x,  y)  ^  0, 
an  individual  who  expects  a  limiting  frequency  of  p^  will  fail  to  learn  the  true  state 
when  limiting  frequency  happens  to  be  p^ .  Since  he  is  almost  certain  that  the  limiting 
frequency  wll  be  fy"  (or  1  —  p^),  he  expects  the  other  agent  not  to  learn  the  truth  and 
thus  he  expects  the  disagreement  between  them  to  persist  asymptotically. 

The  theorem  exploits  this  result  and  the  continuity  of  R  to  show  that  the  individuals 
attach  probability  arbitrarily  close  to  1  to  the  event  that  the  asymptotic  difference 
between  their  beliefs  will  disappear  when  (23)  holds,  and  they  attach  probability  1 
to  asymptotic  disagreement  when  (23)  fails  to  hold.  Thus  the  behavior  of  asj^mptotic 
beliefs  as  uncertainty  vanishes  "uniformly"  are  completely  determined  by  condition  (23), 
a  condition  on  the  tail  of  /. 
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When  y  >  0  (i.e.,  when  p^  ^  p^),  condition  (23)  is  a  familiar  condition  in  statistics. 
Whether  it  is  satisfied  depends  on  whether  /  has  rapidly-varying  (thin)  or  regiilarly- 
varying  (thick)  tails: 

Definition  3  A  density  function  f  has  regularly- varying  tails  if  it  has  unbounded  sup- 
port and  satisfies 

Hm  4r^  =  ^(^)  e  K 

m-too    j  (m) 

for  any  x  >  0. 

The  condition  in  Definition  3  that  H  (x)  G  M  is  relatively  weak,  but  nevertheless 
has  important  implications.  In  particular,  it  implies  that  H{x)  =  x~"  for  a  €  (0,  oo). 
This  follows  from  the  fact  that  in  the  limit,  the  function  H  {■)  must  be  a  solution  to 
the  functional  equation  H{x)H[y)  =  H{xy),  which  is  only  possible  if  H{x)  =  x~"  for 
a  G  (0,  oo).^^  Moreover,  Seneta  (1976)  shows  that  the  convergence  in  Definition  3  holds 
locally  uniformly,  i.e.,  uniformly  for  x  in  any  compact  subset  of  (0,  oo).  This  imphes  that 
if  a  density  /  has  regularly-varying  tails,  then  the  assumptions  imposed  in  Theorem  5 
(in  particular,  the  uniform  convergence  assumption)  are  satisfied.  In  fact,  in  this  case, 
R  defined  in  (17)  is  given  by 

R{x, y) 


.y, 

and  is  everywhere  continuous.  As  this  expression  suggests,  densities  with  regularly- 
varying  tails  behave  approximately  like  power  functions  in  the  tails;  indeed  a  density 
f  (x)  with  regularly-varjing  tails  can  be  written  as  f{x)  =  C{x)x"°'  for  some  slowly- 
varying  function  £  (with  \m\n^^,  C{rnx) / C  {in)  —  1).  Many  common  distributions, 
including  the  Pareto,  log-normal,  and  t-distributions,  have  regularly-varying  densities. 
When  /  has  regularly  varying  tails,  R{x,y)  >  0,  and  condition  (23)  cannot  be  satisfied. 
We  also  define: 

Definition  4  A  density  function  f  has  rapidly- varying  tails  if  it  satisfies 

fimx)  (    0     if    x>l 

hm  ^ip^  =  .x-°°  =        1     ./    X  =  1 

"-^°°  ^("^)  1    oo    if    x<l 


^^To  see  this,  note  that  since  hm„i_oc  if{'nix)/f{rn))  =  H  (x)  e  R,  we  have 

ui      \        r        f  f{^T^"''y)\         y        f  Krnxy)  f{jny)\  \„/   ^ 

H  [xy)  =    hm  ,.     .        =    hm       -— =  H  [x)  H  (y) . 

-    ->     /  ?n     J       '"-=c=  V  ,/  my     f{m)J 


-in  — oc 


See  de  Haan  (1970)  or  Feller  (1971). 
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for  any  x  >  0. 

As  in  Definition  3,  the  above  convergence  holds  locally  uniformly  (uniformly  in  x 
over  any  compact  subset  that  excludes  1).  Examples  of  densities  with  rapidly- varying 
tails  include  the  exponential  and  the  normal  densities.  When  /  has  rapidly  varying  tails 
R{x,y)  =  {x/y)~°°  =  0,  and  condition  (23)  is  satisfied. 

The  next  proposition  formally  states  that  under  the  assumptions  that  noise  vanishes 
uniformly  and  set  p^  7^  p^,  whether  agreement  is  continuous  depends  on  whether  the 
family  of  subjective  densities  converging  to  "certainty"  has  regularly  or  rapidly-varying 
tails: 

Proposition  1  (Tail  Properties  and  Asymptotic  Disagreement  Under  Uni- 
form Convergence)  Suppose  that  the  conditions  in  Theorem  5  are  satisfied  and  that 
p^  ^f.  Then, 

1.  If  f  has  regularly-varying  tails,  then  agreement  is  continuous  at  certainty  under 

2.  If  f  has  rapidly-varying  tails,  then  agreement  is  strongly  discontinuous  at  certainty 
under  [Fl^]. 

Proof.  Wlien  /  has  regularly  or  rapidly  varying  tails,  uniform  convergence  assump- 
tion is  satisfied,  and  the  proposition  follows  from  Definitions  3  and  4  and  from  Theorem 
5.    ■ 

Returning  to  the  intuition  above,  Proposition  1  and  the  previous  definitions  make 
it  clear  that  the  failure  of  asymptotic  agreement,  under  the  assumption  that  i?.^  [p] 
converges  to  R  uniformly,  is  related  to  disagreement  between  the  two  individuals  about 
limiting  frequencies,  i.e.,  p^  ^  p^,  together  with  sufficiently  thick  tails  of  the  subjective 
probability  distribution  so  that  an  individual  who  expects  p^  should  have  sufficient  un- 
certainty when  confronted  with  a  limiting  frequency  of  p^ .  Along  the  lines  of  the  intuition 
given  there,  this  is  sufficient  for  both  individuals  to  believe  that  they  will  learn  the  true 
value  of  6  themselves,  but  that  the  other  individual  will  fail  to  do  so.  Rapidly- varying 
tails  imply  that  individuals  become  relatively  certain  of  their  model  of  the  world  and 
thus  when  individual  i  observes  a  limiting  frequency  p  close  to,  but  different  from  p',  he 
will  interpret  this  as  being  driven  by  sampling  variation  and  attach  a  high  probability 
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to  9  =  A.  This  will  guarantee  asymptotic  agreement  between  the  two  individuals.  In 
contrast,  with  regularly-varying  tails,  even  under  the  uniform  convergence  assumptions, 
limiting  frequencies  different  from  p'  will  be  interpreted  not  as  sampling  variation,  but  as 
potential  evidence  for  9  =  B,  preventing  asymptotic  agreement.  The  following  example 
provides  a  simple  illustration  of  part  1  of  Proposition  1. 

Example  2  Let  /  be  the  Pareto  distribution  and  n^  ■=  tt'^  =  1/2.  The  likelihood  ratio 
is 

and  the  asymptotic  probability  of  6*  =  y4  is 

^i  fnf.\\  ipis)-p' 


for  all  777..  (These  expressions  hold  in  the  limit  in  —>  cx)  under  any  /  with  regularly- 
varying  tails.)  As  illustrated  in  Figure  2,  in  this  case  ^J^jm  i^  ^'^^^  monotone.  To  see  the 
magnitude  of  asymptotic  disagreement,  consider  p{s)  =  ff.  In  that  case,  ^J^sm  (pi^)) 
is  approximately  1,  and  (pio  m  iP  i^))  ^^  approximately  y~^/  (x'"  +  y^°).  Hence,  both 
individuals  believe  that  the  difference  between  their  asymptotic  posteriors  will  be 


This  asymptotic  difference  is  increasing  with  the  difference  y  =  \p^  —  p~\,  which  corre- 
sponds to  the  difference  in  the  individuals'  views  on  which  frequencies  of  signals  are  most 
likely.  It  is  also  clear  from  this  expression  that  this  asymptotic  difference  will  converge 
to  zero  as  y  ^  0  (i.e.,  as  p^  -^  fp-). 

The  last  statement  in  the  example  is  in  fact  generally  true  when  noise  vanishes 
uniformly  and  R  is  continuous.  This  is  explored  in  the  next  proposition. 

Proposition  2  (Limits  to  Asymptotic  Disagreement)  In  Theorem  5,  in  addition, 
assume  that  R  is  continuous  on  the  set  D  —  {{x,y)  |  —  1  <  2:  <  1,  |7/|  <  y}  for  some 
y  >  0.  Then  for  every  e  >  0  and  S  >  0,  there  exist  A  >  0  and  m  G  (0,oo)  such  that 
whenever  \p^  —  P'\  <  A. 

Pr'  f  lim  W  „  -  0I  „  I  >  e)  <  f5         (Vm  >  7T7,,  7  =  1,2). 
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Figure  2:  lini„_oo  0n  (■^')  ^*^i"  Pareto  distribution  as  a  function  of  p  (s)  [a 


>,p' 


3/4. 


Proof.  To  prove  this  proposition,  we  modify  the  proof  of  Part  1  of  Theorem  5 
and  use  the  notation  in  that  proof.  Since  R  is  continuous  on  the  compact  set  D  and 
R[x,Qi)  =  0  for  each  .x,  there  exists  A  >  0  such  tliat  R{f?-  +fP'  —  \,\f?-—p^\)  <  e"/4 
whenever  |p^  —  p^|  <  A.  Fix  any  such  p^  and  p^.  Tlren,  by  the  uniform  conver- 
gence assumption,  there  exists  77  >  0  sucli  that  Rl^{p{s))  uniformly  converges  to 
R  {p  (s)  +  jfP  —  1,  |p  (5)  —  j^l)  on  (p''  —  ?7,p'  +  Tj)  and 

R{p{s)+jP-l,\p{s)-f\)<e"/2 

for  each  p  (s)  in  (p'  —  ry,p''  +  ?/).  The  rest  of  the  proof  is  identical  to  the  proof  of  Part  1 
in  Theorem  5.    ■ 

This  proposition  implies  that  in  the  case  where  noise  vanishes  uniformly  and  the 
individuals  are  almost  certain  about  the  informativeness  of  signals,  any  significant  dif- 
ference in  their  asymptotic  beliefs  must  be  due  to  differences  in  their  subjective  densities 
regarding  the  signal  distribution — that  is,  \p^  —  p~\  cannot  be  too  small.  In  particular, 
when  p^  =  p"-,  we  must  have  R{x,y)  =  0,  and  thus,  from  Theorem  5,  there  will  be 
convergence  to  asymptotic  agreement.  Notably,  however,  the  requirement  that  p^  =  p^ 
is  rather  strong.  For  example.  Corollary  2  established  that  under  certainty  there  is 
asymptotic  agreement  for  all  p\p^  >  1/2. 

In  closing  this  section,  let  us  reiterate  that  the  key  assumption  in  Proposition  2  is  that 
i?^  (p)  uniformly  converges  to  a  continuous  limiting  likelihood  ratio  R.  In  contrast,  recall 
that  Theorem  3  establishes  that  a  slight  uncertainty  may  lead  to  substantial  asymptotic 
disagreement  with  nearly  probability  1  even  when  p^  =  p^  ■  The  crucial  difference  is  that 
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in  Theorem  3  the  hkehhood  ratios  converge  to  a  continuous  hmiting  hkehhood  function 
pointwise,  but  not  umfoTinly. 

4     Generalizations 

The  previous  section  provided  our  main  results  in  an  environment  with  two  states  and 
two  signals.  In  this  section,  we  show  that  the  results  from  the  previous  two  sections 
generalize  to  an  environment  with  K  >  2  states  and  L  >  K  signals.  All  the  proofs  for 
this  section  are  contained  in  the  Appendix  and  to  economize  on  space,  we  do  not  provide 
the  analog  of  Theorem  1 . 

Suppose  0  G  9,  where  0  =  {^i, ...,  .4^-}  is  a  set  containing  K  >  2  distinct  elements. 
We  refer  to  a  generic  element  of  the  set  by  Ak-  Similarty,  let  St  E  {cii,  ...,ai},  with 
L  >  K  signal  values.  As  before,  define  s  =  {sj^j,  and  for  each  /  =  1, ...,  L,  let 

rvj  (s)  =  #{t  <  n\st  =  ai] 

be  the  number  of  times  the  signal  Sj  =  ai  out  of  first  n  signals.     Once  again,  the 
strong  law  of  large  numbers  implies  that,  according  to  both  individuals,  for  each  /  = 
1,...,L,  rn,i{s)/n  almost  surely  converges  to  some  Pi{s)   e   [0,1]  with  Y^i^iPiis)  = 
1.     Define  p{.s)    G   A  (L)  as  the  vector  p{s)   =    {pi{s) ,  ...,pj^{s)),  where  A  (L)   = 
p  =  {pi,. . .  ,pl)  6  [0, 1]^"'   :  ^,^1  pi  =  1  [,  and  let  the  set  S  be 

S  =  {s  £  S  :  lim„_^oo  ^n,/  [s)  /n  exists  for  each  /  =  1, ...,  L}  .  (24) 


{ 


With  analogy  to  the  two-state-two-signal  model  in  Section  2,  let  tt).  >  0  be  the  prior 
probabihty  individual  i  assigns  to  ^  =  .4/^,  tt'  =  {n\,  ...,tt\^),  and  pej,  be  tlie  frequency 
of  observing  signal  s  =  a;  when  the  true  state  is  9.  ^^''llen  players  are  certain  about  pe/s 
as  in  usual  models,  immediate  generalizations  of  Theorems  1  and  1  apply.  With  analogy 
to  before,  we  define  Fg  as  the  joint  subjective  probability  distribution  of  conditional 
frequencies  p0  =  {pg^i,  ■■■,Pe,L)  according  to  individual  ?'.  Since  our  focus  is  learning 
under  uncertainty,  we  impose  an  assumption  similar  to  Assumption  1. 

Assumption  2  For  each  i  and  9,  the  distribution  Fg  over  A(L)  has  a  continuous,  non- 
zero and  finite  density  fl  over  A(L). 

This  assumption  can  be  weakened  along  the  lines  discussed  in  Remark  2  above. 

30 


We  also  define  0^,  „  {s)  =  Pr*  {6  =  Ak  \  {■sJjLo)  for  each  k  =   1, ,..,  /C  as  the  posterior 
probabihty  that  9  =  Ak  after  observing  the  sequence  of  signals  {■sJ"^q,  and 

n — »oo 

Given  this  structure,  it  is  straightforward  to  generalize  the  results  in  Section  2.  Let  us 
now  define  the  transformation  T^  :  M^'  — >  R^~\  such  that 

TUx)=(jj-;k'e{l,...,K}\k 

Here  T^  (x)  is  taken  as  a  column  vector.  This  transformation  will  play  a  useful  role  in  the 
theorems  and  the  proofs.  In  particular,  this  transformation  will  be  applied  to  the  vector 
TT*  of  priors  to  determine  the  ratio  of  priors  assigned  the  different  states  by  individual  i. 
Let  us  also  define  the  norm  ||.t||  =  max;  |.x-;|  for  x  =  (xi, . . . ,  xl)  G  M^. 
The  next  lemma  generalizes  Lemma  1  (proof  omitted). 

Lemma  2  Suppose  Assumption  2  holds.   Then  for  all  s  E  S, 

1 


'^U(p('^))- 


1  + 


Our  first  theorem  in  this  section  parallels  Theorem  2  and  shows  that  under  Assump- 
tion 2  there  will  be  lack  of  asymptotic  learning,  and  under  a  relatively  weak  additional 
condition,  there  will  also  asymptotic  disagreement. 

Theorem  6   (Generalized  Lack  of  Asymptotic  Learning  and  Agreement)  Sup- 
pose Assumption  2  holds  for  i  —  1,2,  then  for  each  k  —    I, ...,  K ,  and  for  each  i  =  1,2, 

2.  W  (K^  {p{s))  -  ^l^  {p{s))\  7^  0)  =  1  wheneverW({Tk  {n'yn  {n')m{f^{p{s)) 
0)  =  0  and  F^  =  F|  for  each  (9  G  9. 

The  additional  condition  in  part  2  of  Theorem  6,  that  Pr'HTk  {TT^)-Tk  {n^)yTk{f'{p{s))  = 
0)  =  0,  plays  the  role  of  differences  in  priors  in  Theorem  2  (here  "  '  "  denotes  the  trans- 
pose of  the  vector  in  ciuestion).  In  particular,  if  this  condition  did  not  hold,  then  at  some 
p{s),  the  relative  asymptotic  likehhood  of  some  states  could  be  the  same  according  to 
two  individuals  with  different  priors  and  they  would  interpret  at  least  some  sequences  of 
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signals  in  a  similar  manner  and  achieve  asymptotic  agreement.  It  is  important  to  note 
that  the  condition  that  Pr'((Tfc  {n^)  -  Tk  {ir'^))'Tk{.f"'{p{s))  =  0)  =  0  is  relatively  weak 
and  holds  generically — i.e.,  if  it  did  not  hold,  a  small  perturbation  of  tt^  or  tt^  would 
restore  it.^"*  The  Part  2  of  Theorem  6  therefore  implies  that  asymptotic  disagreement 
occurs  generically. 

We  next  define  continuity  and  discontinuity  of  asymptotic  agreement  at  certainty  in 
this  more  general  case.  A  family  of  subjective  probability  distributions  is  again  denoted 
by  {Fg.^}.  Throughout  {Fg^}  converge  to  a  Dirac  distribution  5p^,  where  p^  G  A(L), 
and  5j,i  is  such  that  there  is  asymptotic  agreement  (that  is,  there  is  asymptotic  agreement 
when  learning  is  under  uncertainty).  The  corresponding  asymptotic  beliefs  are  denoted 
by  0fc,oo,m  and  (pl^^,^,  for  /c  =  1, ...,  K  and  m  G  N. 

Definition  5  Asymptotic  agreement  is  continuous  at  certainty  under  family  {Fg^} 
if  for  all  e  >  0,  for  each  k  —  1, ...,  K  and  for  each  i  =  1,2, 

hm  Pr''-^"  (|0U,m.  -  4,o.,m\  <  e)  =  1- 

m~->oo  Mil  1     1      1 

Asymptotic  agreement  is  continuous  at  certainty  at  {p^,p'^)  G  A(L)'^  if  it  is  con- 
tinuous at  certainty  under  all  families  {Fq^}  converging  to  5^^ . 

Definition  6  Asymptotic  agreement  is  strongly  discontinuous  at  certainty  under 

family  {Fg.^}  if  there  exists  e  >  0  such  that 

hm  Pr''"(Koo,m-<oo,.J>^)  =  l 

for  each  k  =  1, ...,  K  and  each  i  =  1,2.  Asymptotic  agreement  is  strongly  discontinu- 
ous at  certainty  at  (p^.p'^)  G  A  (L)  ^  //  asymptotic  agreement  is  strongly  discontinu- 
ous at  certainty  under  some  family  {Fg^^}  converging  to  5. pi . 

The  next  result  generalizes  Theorem  3: 


"More  formally,  the  set  of  solutions  5  =  {{■k\  ■k''' ,  p)  G  A(L)2  :  (2^  (tt')  -  Tk  {n-)YTk{f(p))  =  0} 
has  Lebesgue  measure  0.  This  is  a  consequence  of  the  Preimage  Theorem  and  Sard's  Theorem  in 
differential  topology  (see,  for  example,  Guillemin  and  Pollack,  1974,  pp.  21  and  39).  The  Preimage 
Theorem  implies  that  if  j/  is  a  regular  value  of  a  map  f  :  X  ^  Y,  then  f"^  (y)  is  a  submanifold  of  X 
with  dimension  equal  to  dimX  —  dimF.  In  our  context,  this  implies  that  if  0  is  a  regular  value  of  the 
map  {Tk  (tt^)  -Tk  (7r"))'Tfc(/'(p)),  then  the  set  S  is  a  two  dimensional  submanifold  of  A(L)^  and  thus 
has  Lebesgue  measure  0.  Sard's  theorem  implies  that  0  is  generically  a  regular  value. 
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Theorem  7  (Generalized  Strong  Discontinuity  of  Asymptotic  Agreement)  As- 
ymptotic agreement  is  strongly  discontinuous  at  each    {p^  ,p^)  €  A  [Lf^ . 

Towards  generalizing  Theorem  5,  we  now  formally  present  the  appropriate  families 
of  probabiHty  densities  and  introduce  the  necessary  notation; 

Assumption  3  For  each  9  &  Q  and  m  G  N,  let  the  subjective  density  fg^  be  defined  by 

fi^,ip)  =  c{i,e,m)  f  {m{p  -  p{i.e))) 

where  c  {i,  9,  m)  =  1/  /pg^(^)  /  {m  [p  -  p  (?:,  9)))  dp,  p  (?,  9)  G  A  (L)  with  p  (z,  9)  y^  p  {i,  9') 
whenever  9  ^  9' ,  and  f  :M.^  —^M.  is  a  positive,  continuous  probability  density  function 
that  satisfies  the  following  conditions: 

(i)  lim^_,oomax|^.||^||>,j}  /  (,r)  =  0, 

(u) 

R{x,y)^   lim  ^^  (25) 

m^oo  f  {my) 

exists  at  all  x,y,  and 
(Hi)  convergence  in  (25)  holds  uniformly  over  a  neighborhood  of  each  [p  (?',  9)  —  p  [j,  9')  ,p{i,9)  —  p  (j,  6) 

Writing  (f)\  ^  ^  (p  (s))  =  lim„^oo  cpl  „ _„  (s)  for  the  asymptotic  posterior  of  individual 
i  with  subjective  density  fg  ^ ,  we  are  now  ready  to  state  the  generalization  of  Theorem 
5. 

Theorem  8  (Generalized  Asymptotic  Agreement  and  Disagreement  Under 
Uniform  Convergence)  Under  Assumption  3,  the  following  are  true: 

1.  Suppose  that  R  {p  {i,  9)  —  p  (j,  9')  ,p{i,9)  —  p  (j,  ^))  =  0  for  each  distinct  9  and  9' . 
Then,  asymptotic  agreement  is  continuous  under  {i^^„,}. 

2.  Suppose  that  R  (p  {i.  9)  -  p  [j,  9') ,  p  {i,  9)  ~  p  {j,  ^))  7^  0  for  each  distinct  9  and  9'. 
Then,  asymptotic  agreement  is  strongly  discontinuous  under  {-F(J,„}. 
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These  theorems  therefore  show  that  the  results  about  lack  of  asymptotic  learning  and 
asymptotic  agreement  derived  in  the  previous  section  do  not  depend  on  the  assumption 
that  there  are  only  two  states  and  binary  signals.  It  is  also  straightforward  to  generahze 
Propositions  2  and  1  to  the  case  with  multiple  states  and  signals;  we  omit  this  to  avoid 
repetition. 

We  assumed  both  the  number  of  signal  values  and  states  are  finite.  This  assumption 
can  be  dropped  in  the  expense  of  introducing  technical  issues  that  are  not  central  to  our 
focus  here. 

5      Concluding  Remarks 

The  standard  approach  in  game  theory  and  economic  modehng  assumes  that  individu- 
als have  a  "common  prior,"  meaning  that  they  have  beliefs  consistent  with  each  other 
regarding  the  game  forms,  institutions,  and  possible  distributions  of  payoff-relevant  pa- 
rameters. This  presumption  is  often  justified  by  the  argimient  that  sufficient  common 
experiences  and  observations,  either  through  individual  observations  or  transmission  of 
information  from  others,  will  eliminate  disagreements,  taking  agents  towards  common 
priors.  This  presumption  receives  support  from  a  number  of  well-known  theorems  in 
statistics,  such  as  Savage  (1954)  and  Blackwell  and  Dubins  (1962). 

Nevertheless,  existing  theorems  apply  to  environments  in  which  learning  occurs  un- 
der certainty,  that  is,  individuals  are  certain  about  the  meaning  of  different  signals. 
Certainty  is  sufficient  to  ensure  that  payoff-relevant  variables  can  be  identified  from  lim- 
iting frequencies  of  signals.  In  many  situations,  individuals  are  not  only  learning  about 
a  payoff-relevant  parameter  but  also  about  the  interpretation  of  different  signals,  i.e., 
learning  takes  place  under  uncertainty.  For  example,  many  signals  favoring  a  particular 
interpretation  might  make  individuals  suspicious  that  the  signals  come  from  a  biased 
source.  This  may  prevent  full  identification  (in  the  standard  sense  of  the  term  in  econo- 
metrics and  statistics).  In  such  situations,  information  will  be  useful  to  individuals  but 
may  not  lead  to  full  learning. 

This  paper  investigates  the  conditions  under  which  learning  under  uncertainty  will 
take  individuals  towards  common  priors  and  asymptotic  agreement.  We  consider  an 
environnrent  in  which  two  individuals  with  different  priors  observe  the  same  infinite 
sequence  of  signals  informative  about  some  underlying  parameter.    Learning  is  under 
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uncertainty,  however,  because  each  individual  has  a  non-degenerate  subjective  probabil- 
ity distribution  over  the  hkehhood  of  different  signals  given  the  values  of  the  parameter. 
When  subjective  probabihty  distributions  of  both  individuals  have  full  support,  they 
will  never  agree,  even  after  observing  the  same  infinite  sequence  of  signals. 

Our  main  results  provide  conditions  under  which  asymptotic  agreement  is  fragile 
or  discontinuous  at  certainty  (meaning  that  as  the  amount  of  uncertainty  in  the  envi- 
ronment diminishes,  we  remain  awaj'  from  asymptotic  agreement).  We  first  show  that 
asymptotic  agreement  is  discontinuous  at  certainty  for  every  model.  In  particular,  a  van- 
ishingly  small  amount  of  uncertainty  about  the  signal  distribution  can  guarantee  that 
both  individuals  attach  probability  arbitrarily  close  to  1  that  they  will  asymptotically 
disagree.  Under  additional  strong  continuity  and  uniform  convergence  assumptions, 
we  also  characterize  the  conditions  under  which  asymptotic  agreement  is  continuous  at 
certainty.  Even  under  these  assumptions,  asymptotic  disagreement  may  prevail  as  the 
amount  of  uncertainty  vanishes,  provided  that  the  family  of  subjective  distributions  has 
regularly- varying  tails  (such  as  for  the  Pareto,  the  log-normal  or  the  t-distributions).  In 
contrast,  with  rapidly-varying  tails  (such  as  the  normal  and  the  exponential  distribu- 
tions), convergence  to  certainty  leads  to  asymptotic  agreement. 

Lack  of  common  beliefs  and  common  priors  has  important  implications  for  economic 
behavior  in  a  range  of  circumstances.  The  type  of  learning  outlined  in  this  paper  interacts 
with  economic  behavior  in  various  different  situations.  The  companion  paper,  Acemoglu, 
Chernozhukov  and  Yildiz  (2008),  ihustrates  the  influence  of  learning  under  uncertainty 
and  lack  of  asymptotic  agreement  on  games  of  coordination,  games  of  common  interest, 
bargaining,  asset  trading  and  games  of  communication. 
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6     Appendix:   Omitted  Proofs 


Proof  of  Lemma  1.     Write 


Pr''  {rn\9  =  A) 


jlv'"{\-pY-'-fB{i-v)dp 


?)-"-'■- fA{'p)dp 


j„'p-"(l-p)"-'-"/B(l-p)rip 

_  Jo'p'-"(l-p)"-'-"dp 

~         Jo'p'"(l-p)"-^'"/4(p)rfp 

Jo'p'''(l-p)"-'"rfp 

E^ifBJl  ~ p)\rn] 
E^[fA{p)K]     ■ 

Here,  the  first  equality  is  obtained  by  dividing  the  numerator  and  the  denominator  by  the 
same  term.  The  resulting  expression  on  the  numerator  is  the  conditional  expectation  of 
/b  (1  —  p)  given  r„  under  the  flat  (Lebesgue)  prior  on  p  and  the  Bernoulli  distribution  on 
{st}(Lo-  Denoting  this  by  E'^[/b(1  —p)\r„]:  and  the  denominator,  which  is  similarly  defined  as 
the  conditional  expectation  of  Ja  (p),  by  E'^[/.4(p)|r„],  we  obtain  the  last  equality.  By  Doob's 
consistency  theorem  for  Bayesian  posterior  expectation  of  the  parameter,  as  r„  — »  p,  we  have 
that  E^[/s(l  -p)|r„,]  -^  /b(1  -  p)  and  E'M/4(p)|r„]  --  /^(p)  (see,  e.g.,  Doob,  1949,  Ghosh 
and  Ramamoorthi,  2003,  Theorem  1.3.2).  This  establishes 


Pr'(r,,,| 


B) 


Pr'-  irn\0  =  A) 
as  defined  in  (4).  Equation  (3)  then  follows  from  (2). 


R'  (P) 


Proof  of  Theorem  4.     For  each  ?n  S>  1,  let 


xe/\    ifpe  [p',-A/2,p^  +  A/2] 
s^         if  p  <  1  -  p^,  -  A/2, 
e  otherwise, 


where  9'  ^  9,  e  =  \  =^  1/m,  p\  =  pa  +  A,  p^  ^  Pb  -  A,  p\  =  pa  -  X,  p%  =  Pb  +  A,  and 
xe  =  1  —  e  [pg,  —  A/2)  -  e^  (l  -  p^,  -  A/2)  e  (0, 1).  Here,  xg  is  close  to  1  for  large  m.  Then, 


f    l/e2        ifp<l-p'g-A/2, 


i?''™  (p)  =  <^ 


XBJe^ 

1 

e^jXA 


f  1-Pb-A/2<p<1-p's  +  A/2, 
fl  -  p'b  +  A/2  <  p  <  p\  -  A/2, 
fp^4  -  A/2  <p<p:4  + A/2, 
if  p  >  p:4  +  A/2, 


which  is  clearly  decreasing  when  m  is  large.  For  e  =  0,  we  have 

oo     if  p  <  1  -  ?5)5  +  A/2, 
/?.'■•'"  (p)  -  <(    I      ifl  -  p'e  +  A/2  <p<p'4- A/2, 
0      if  p  >  p'4  -  A/2, 
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and  hence 


0  ifp<l-p'B  +  A/2, 

0r(p)-<(  ^'-  ifi-p^B+V2<p<p:4-A/2, 

1  if>  >  p\  -  A/2. 


Notice  that  when  p  G 


p2j  -  \l2,ii\  +  A/2] ,  we  have  p  <v\-  A/2,  so  that  (^^  [p)  ^  1  and 
V^'"  (P)  -  </'^"'  (P)|  =  1-vri.  Similarly,  when  ^  £  [l  -  p^  -  A/2,pij  +  A/2], 


we  have  <p'^  {p)  ^  0  and  (/>^"'  (/j)  =  tt^  so  that  k;!'io"'  (p) 


(p)   =  TT  .  In  order  to  complete 


the  proof  of  theorem,  we  then  pick  Z  =  min  {tt",  1  —  tt^  }  /2.  In  that  case, 
lim  Pr^--  (10^-  {p)  -  0^'"  {p)\  >Z)^    hm   Pr^'™  (p  e  [l  -  p^j  -  A/2,p)j  +  A/2])  =  I-tt^  >  0, 

771— >00  '  777— <0O  ^  '"  'J/ 

and 

lim   Pr^.-  (|<^Ji-  (p)  -  </,^-  (p)|  >  Z)  =    lim   Pr^--  (p  €  [p?i  -  A/2,p2  +  A/2])  =  tt^  >  0, 

completing  the  proof.    ■ 

Proof  of  Theorem  6. 

(Proof  of  Parti)  This  part  immediatel)^  follows  from  Lemma  2,  as  each  fr^ifAy  (p(s))  is 
positive,  and  TT%fAk  {p{s))  is  finite. 

(Proof  of  Pai't   2)  Assmiie  Fg   =  Fg  for  each  6'  G  0.    Then,  bj'  Lemma  2,  4>\ooip)  ~~ 

4>k,ooiP)  =  0  if  and  only  if  (7^  (tt^)  -  Tk  {n~))' Tk  [{feiP))eeQ)  ""  ^-  '^^^^  ^^^^^^  inequality  has 


probability  0  under  both  probability  measures  Pr'^  and  Pr"  by  hypothesis. 

Proof  of  Theorem  7.    Pick  sequences  jd^"'  — >  pg  and  e  >  0  such  that 
for  all  9,9'  (including  9  =  9').  For  each  {9,i),  define 


1,777  2,771. 

'e     -  Pf,' 


>  e/r 


D 


y"  =  {peA{L):3 


P-Pe 


<  e/m}  , 


which  will  be  the  set  of  likely  frequencies  at  state  6  according  to  i.  Notice  that  Dl'^TiD^g,'^  ^  0 
iff  e  =  61'  andi  =  ?:'.  Define 


/ff.77,,(P) 


4,777,    ifpel)(, 

1/m     otherwise. 


where  x\  ^  is  normalized  so  that  /^  ^  is  a  probability  density  function.  By  construction  of  se- 
quences /g  ^  and  Pg™',  Fl  ^^  — >  5^.  for  each  (6*,  i).  We  will  show  that  agreement  is  discontinuous 
under  {FlJ.  Now 

</'L.,0,m  (P)  ~  


if  p  e  Dg"''  for  some  9  and  ^J^j  m.  (p)  —  ^'  otherwise.  Note  that  <i>^oo,e,m.  (p)  ^  1  if  P  S  Z?g"\ 
Moreover,  since  the  sets  D^'™'  are  disjoint  (as  we  have  seen  above),  4>'oo,m.{p)  —  '"'"'  when 
p  G  Dg"*.  Hence,  there  exist  m  such  that  for  any  m  >  m  and  any  p  G  £?'■"'  =  [jgDg"^, 

||<?^'oo,777,  (p)-'^io,m(p)||   >£ 
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where  e  =  minj,^  [l  -  tt^J  /2.    But  for  each  6,  Pr'-'"  ip  e  Dg'"'|0j  >  1  -  1/m,  showing  that 
Pr'-"'  (p  G  D''"')  >  1  -  1/m.  Therefore, 

,hm  Pr''"'(||(^'^,„,-C,,„,||>c)  =  l. 


Proof  of  Theorem  8.     Our  proof  utihzes  the  following  two  lemmas. 
Lemma  A. 


lim 


fc.OO.TTA 


(P) 


1 


l  +  E;e'^/c5^^(p-p(i,AfcO,p-p(i,A,)) 


Proof.  By  condition  (i),  Iimm.-,oo  c  {i,  Ak,  rn)  —  1  for  each  i  and  k.  Hence,  for  every  distinct 
k  and  k', 

m^oo  p^^  [p)         m->oo  c  (z,  A;,,  m)   m-^oo   /  [m  [p-p{l,Ak))) 

Then,  Lemma  A  follows  from  Lemma  2.  ■ 

Lemma  B.  For  any  e  >  0  and  h  >  0,  there  exists  m  such  that  for  each  in  >  ifi,  k  <  K, 
and  each  p{s)  with  \\p{s)  —  p{i,Ai,)\\  <  h/m, 


< 


(26) 


Proof.  Since,  by  hypothesis,  R.  is  continuous  at  each  (p  (z,  9)  —  p  (j,  9')  ,p  (?',  9)  —  p  {j,  9)), 
by  Lemma  A,  there  exists  h'  >  0,  such  that 

hm  ^1,,^_,„  {p  (s))  -    lim   (/>1,,^  ,„  {p  (i,  Afc))    <  i/2  (27) 

m.— >oo        '      '  rn— too        ' 

and  by  condition  (iii),  there  exists  m  >  h/h!  such  that 

4,oo,m  (P  (*))  -   lim  4,^,„,  [p  (s))    <  £/2.  (28) 

'      '  m— >oo 

holds  uniformly  in  \\p{s)  —  ■p{i,Ak)\\  <  h' .    The  inequahties  in  (27)  and  (28)  then  imply  (26). 


Lemma  C.  Imw^oo  (4,oo,m  (P  (i,  -4fc))  -  <^i_oo,m  (?5  (^.  ^fc)))  =  0  iff  .R  (p  (i,  A^)  -  p  (j,  ^/t')  ,P  (i,  A^)  -  p  {j 
0  for  each  k'  ^  k. 

Proof.    Proof.    Since  R{p[i,Ak)  —  p  (i,  ^4^/)  ,0)  =  0  for  each  k'  ^  k  (by  condition  (i)). 

Lemma  A  implies  that  limTO_,oo<)^fc,oo,-m(p(^'^fc))  =  1-  Hence,  linim^oo  (?^fc,co,m  (P(^^A-))  "  <?^fc,oo,m  (P(^'^fc)) 

0  if  and  only  if  limm^oo  <?^i,oo,m 


(p(i,^fc))  =  L  Since  each  ratio  7r|,,/7r'[  is  positive,  by  Lemma 
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A,  the  latter  holds  if  only  if  R  {p  (z,  Ak)  -  p  {j,  Ai-,)  ,p{i,  Ak)  -  p  {j,  Ak))  =  0  for  each  k'  y^  k. 

m 

(Proof  of  Part  1)  Fix  e  >  0  and  S  >  0.  We  will  find  m  e  N  such  that 

Pr''  (||<^L,n,.(s)  -  cl>l,m  (s)||  >e)<5         {^m>m,i  =  1,2). 

Fix  any  i  and  k.  Since  each  7r-^//7r;[  is  finite,  by  Lemma  2,  there  exists  e'  >  0,  such  that 
<Pk,oo,m  (pis))  >  1  -  e  whenever  f\^^  {p{s))/f\^  {p{s))  <  e'  holds  for  every  k'  7^  k.  Now,  by 
(i),  there  exists  ho^k  >  0,  such  that 


Fv''i\\p{s)-p{i,Ak)\\<ho,k/m\9  =  Ak)=  /  f  (x)  dx  >  {1  -  S) . 

Let 

Qk,rr,.  =  {peA(L):  \\p  -  p  (i,  Ak)  II  <  ho,k/m} 

and  K.  =  min||j,||</,g  ^,  f  {x)  >  0.  By  (i),  there  exists  hijc  >  0  such  that,  whenever  ||2;||  >  hj^k, 
f  (x)  <  €'k/2.  There  exists  a  sufficiently  large  constant  7nifc  such  that  for  any  m  >  mi^fc, 
p{s)  e  Qk,m.,  and  any  k'  7^  k,  we  have  ||p(s)  —  p{i,Ak')\\  >  hi^k/m,  and 

f{m{p{s)-p{i,Ak,)))       £Vc  1  ^  e^ 
/(777,(p(s)-p(J,yV)))  2    k,        2- 

Moreover,  since  lim,„.->ooC  (i,  ^,777.)  —  1  for  each  i  and  6,  there  exists  7712, fc  >  'irii^k  such  that 
c(2,  Afc/,77i) /c(i,  Afc,7Ti)  <  2  for  every  k'  ^  k  and  m  >  7772, fc-  This  implies 

f\,{p{s))irA,{p{s))<e', 

establishing  that 


^k^oo.m 


{p{s))>l-e.  (29) 


Now,  for  j  7^  7,  assume  that  ^  (p  {i,  0)  —  p  (j,  6*')  ,p{i,0)  —  p  (j,  9))  =  0  for  each  distinct  9 
and  9'.  Then,  by  Lemma  A,  limm-*oo  0^  00  m  (p(^^fc))  =  li  and  hence  by  Lemma  B,  there 
exists  m^^k  >  Tn2,k  such  that  for  each  771  >  ma^/t,  p  [s)  e  Qkjn, 

<oo,^,(/>(s))>l-e-  (30) 

Notice  that  when  (29)  and  (30)  hold,  we  have  ||<;i'L,,n.  (*')  ~ '^L.jn,  (-5)11    <  £•     Then,  setting 
771  —  max/j77i4  fc,  we  obtain  the  desired  inequality  for  each  m  >  in: 


Pr'  (||</.^,^,  is)  -  0';,,,„  {s)\\  <e)     =     Yl  P^-'  (11'^-.-  (^)  -  '^'So.n  (s)||  <  e|e  -  ^0  Pr''  (0  =  A^) 

k<K 

>  Yl  p^''  (^  (^^)  ^  ^'^.-1^  ^  ^^)  p^"'  (^  =  ^^) 

k<K 


fc</C 

=    1-6. 
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(Proof  of  Part  2)  Assume  that  R  (p  {i,  6)  -  p  {j,  9')  ,p{i,  6)  -  p  {j,  9))  ^  0  for  each  dis- 
tinct 9  and  9' .  We  will  find  e  >  0  such  that  for  each  (5  >  0,  there  exists  m  G  N  such  that 

P^'  {\\<t>L,mis)  -  <Plo.rr,.is)\\   >  e)   >l-S  (Vm  >  771,  Z  =  1,  2). 

Now,  since  each  7rj,,/7r-^  is  positive.  Lemma  A  implies  that  hmm^oo  <?^fc  ,30  m  (^('^'^fc))  <  ^  ^^^ 
each  k.  Let 

Then,  by  Part  1,  for  each  k,  there  exists  7712, /t  such  that  for  every  m  >  77i2,fc  and  p{s)  G  Qk,m^ 
we  have  0^00  iPi^))  >  1  —  £■  By  Lemma  B,  there  also  exists  7715,/^  >  77T.2,fc  such  that  for  every 

777  >  7775,^.   and  p  (s)  e  Qfc.m, 

-^ioo.m  (P  (5))  <  J™,  -^ioo.m  (P  (■'.  ^k))  +  £  <  1  -  2e  <  0-^,^  (p  (s))  -  e. 

This  implies  that  lle/i^.m  ipi^))  "  <^So,m  (p('5'))||  >  ^-  Setting  771  =  maxk'm^^k  and  changing 
\\^lo,m  (s)  -  (Plo,^  (5)11  <  e  at  the  end  of  the  proof  of  Part  1  to  ||0^_„.  (s)  -  ^^^„  (s)||  >  e,  we 
obtain  the  desired  inequality.    ■ 
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